Optimizing MySQL query with expensive INNER JOIN - mysql

Using trial and error i've discovered that when removing a join from the below query it runs around 30 times quicker. Can someone explain why this would be and if it's possible to optimise the query to include the additional join without the performance hit.
This is a screenshot of the explain which shows that the index isn't being used for the uesr_groups table.
http://i.imgur.com/9VDuV.png
This is the original query:
SELECT `comments`.`comment_id`, `comments`.`comment_html`, `comments`.`comment_time_added`, `comments`.`comment_has_attachments`, `users`.`user_name`, `users`.`user_id`, `users`.`user_comments_count`, `users`.`user_time_registered`, `users`.`user_time_last_active`, `user_profile`.`user_avatar`, `user_profile`.`user_signature_html`, `user_groups`.`user_group_icon`, `user_groups`.`user_group_name`
FROM (`comments`)
INNER JOIN `users` ON `comments`.`comment_user_id` = `users`.`user_id`
INNER JOIN `user_profile` ON `users`.`user_id` = `user_profile`.`user_id`
INNER JOIN `user_groups` ON `users`.`user_group_id` = `user_groups`.`user_group_id`
WHERE `comments`.`comment_enabled` = 1
AND `comments`.`comment_content_id` = 12
ORDER BY `comments`.`comment_time_added` ASC
LIMIT 20
If I remove the "user_groups" join then the query runs 30 times quicker as mentioned above.
SELECT `comments`.`comment_id`, `comments`.`comment_html`, `comments`.`comment_time_added`, `comments`.`comment_has_attachments`, `users`.`user_name`, `users`.`user_id`, `users`.`user_comments_count`, `users`.`user_time_registered`, `users`.`user_time_last_active`, `user_profile`.`user_avatar`, `user_profile`.`user_signature_html`
FROM (`comments`)
INNER JOIN `users` ON `comments`.`comment_user_id` = `users`.`user_id`
INNER JOIN `user_profile` ON `users`.`user_id` = `user_profile`.`user_id`
WHERE `comments`.`comment_enabled` = 1
AND `comments`.`comment_content_id` = 12
ORDER BY `comments`.`comment_time_added` ASC
LIMIT 20
My tables are below, can anyone offer any insight into how to avoid a performance hit for including the user_groups table?
--
-- Table structure for table `comments`
--
CREATE TABLE IF NOT EXISTS `comments` (
`comment_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`comment_content_id` int(10) unsigned NOT NULL,
`comment_user_id` mediumint(6) unsigned NOT NULL,
`comment_original` text NOT NULL,
`comment_html` text NOT NULL,
`comment_time_added` int(10) unsigned NOT NULL,
`comment_time_updated` int(10) unsigned NOT NULL,
`comment_enabled` tinyint(1) NOT NULL DEFAULT '0',
`comment_is_spam` tinyint(1) NOT NULL DEFAULT '0',
`comment_has_attachments` tinyint(1) unsigned NOT NULL,
`comment_has_edits` tinyint(1) NOT NULL,
PRIMARY KEY (`comment_id`),
KEY `comment_user_id` (`comment_user_id`),
KEY `comment_content_id` (`comment_content_id`),
KEY `comment_is_spam` (`comment_is_spam`),
KEY `comment_enabled` (`comment_enabled`),
KEY `comment_time_updated` (`comment_time_updated`),
KEY `comment_time_added` (`comment_time_added`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=352 ;
-- --------------------------------------------------------
--
-- Table structure for table `users`
--
CREATE TABLE IF NOT EXISTS `users` (
`user_id` mediumint(6) unsigned NOT NULL AUTO_INCREMENT,
`user_ipb_id` int(10) unsigned DEFAULT NULL,
`user_activated` tinyint(1) NOT NULL DEFAULT '0',
`user_name` varchar(64) CHARACTER SET latin1 NOT NULL,
`user_email` varchar(255) NOT NULL,
`user_password` varchar(40) NOT NULL,
`user_content_count` int(10) unsigned NOT NULL DEFAULT '0',
`user_comments_count` int(10) unsigned NOT NULL DEFAULT '0',
`user_salt` varchar(8) NOT NULL,
`user_api_key` varchar(32) NOT NULL,
`user_auth_key` varchar(32) DEFAULT NULL,
`user_paypal_key` varchar(32) DEFAULT NULL,
`user_timezone_id` smallint(3) unsigned NOT NULL,
`user_group_id` tinyint(3) unsigned NOT NULL,
`user_custom_permission_mask_id` tinyint(3) unsigned DEFAULT NULL,
`user_lang_id` tinyint(2) unsigned NOT NULL,
`user_time_registered` int(10) unsigned NOT NULL,
`user_time_last_active` int(10) unsigned NOT NULL
PRIMARY KEY (`user_id`),
UNIQUE KEY `user_email` (`user_email`),
KEY `user_group_id` (`user_group_id`),
KEY `user_auth_key` (`user_auth_key`),
KEY `user_api_key` (`user_api_key`),
KEY `user_custom_permission_mask_id` (`user_custom_permission_mask_id`),
KEY `user_time_last_active` (`user_time_last_active`),
KEY `user_paypal_key` (`user_paypal_key`),
KEY `user_name` (`user_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=33 ;
-- --------------------------------------------------------
--
-- Table structure for table `user_groups`
--
CREATE TABLE IF NOT EXISTS `user_groups` (
`user_group_id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
`user_group_name` varchar(32) NOT NULL,
`user_group_permission_mask_id` tinyint(3) unsigned NOT NULL,
`user_group_icon` varchar(32) DEFAULT NULL,
PRIMARY KEY (`user_group_id`),
KEY `user_group_permission_mask_id` (`user_group_permission_mask_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=8 ;
-- --------------------------------------------------------
--
-- Table structure for table `user_profile`
--
CREATE TABLE IF NOT EXISTS `user_profile` (
`user_id` mediumint(8) unsigned NOT NULL,
`user_signature_original` text,
`user_signature_html` text,
`user_avatar` varchar(64) DEFAULT NULL,
`user_steam_id` varchar(64) DEFAULT NULL,
`user_ps_id` varchar(16) DEFAULT NULL,
`user_xbox_id` varchar(64) DEFAULT NULL,
`user_wii_id` varchar(64) DEFAULT NULL,
PRIMARY KEY (`user_id`),
KEY `user_steam_id` (`user_steam_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Most database engines calculate their query plan based on statistics about the tables - for instance, if a table has a small number of rows, it's quicker to go to the table than the index. Those statistics are maintained during "normal" operation - e.g. inserts, updates and deletes - but can get out of sync when table definitions are changed, or when you do bulk inserts.
If you see unexpected behaviour in the query plan, you can force the database to update its statistics; in MySQL you can use Optimize Table - which does everything, including re-ordering the table itself, or Analyze Table which only updates the indices.
This is hard to do on production environments, as both operations lock the tables; if you can possibly negotiate a maintenance window, that's by far the simplest way to deal with the problem.
It's worth measuring performance of "optimize table" - on well-specified hardware, it should take only a couple of seconds for "normal" size tables (up to low millions of records, with only a few indices). That might mean you can have an "informal" maintenance window - you don't take the application off-line, you just accept that some users will have degraded performance while you're running the scripts.

MySQL has an EXPLAIN feature which will help you to understand the query:
$ mysql
> EXPLAIN SELECT `comments`.`comment_id`, `comments`.`comment_html`,`comments`.`comment_time_added`, `comments`.`comment_has_attachments`, `users`.`user_name`, `users`.`user_id`, `users`.`user_comments_count`, `users`.`user_time_registered`, `users`.`user_time_last_active`, `user_profile`.`user_avatar`, `user_profile`.`user_signature_html`
FROM (`comments`)
INNER JOIN `users` ON `comments`.`comment_user_id` = `users`.`user_id`
INNER JOIN `user_profile` ON `users`.`user_id` = `user_profile`.`user_id`
WHERE `comments`.`comment_enabled` = 1
AND `comments`.`comment_content_id` = 12
ORDER BY `comments`.`comment_time_added` ASC
LIMIT 20
MySQL might simply be missing, or skipping an index.
You can learn more about understanding the output of EXPLAIN here from the documentation (a little hard-core), or better yet from a simpler explanation here, (ignore the fact that it's on a Java site.)
More than likely the amount of data, or an outdated or incomplete index is meaning that MySQL is falsely doing a table scan. When you see table scans, or sequential serches, you can often easily see which field is missing an index, or an index which is not usable.

Could you please try this one (you can remove join with user_group ). It can be faster in case if query retrieve small data set from comments table:
SELECT
comments.comment_id, comments.comment_html, comments.comment_time_added, comments.comment_has_attachments, users.user_name, users.user_id, users.user_comments_count, users.user_time_registered, users.user_time_last_active, user_profile.user_avatar, user_profile.user_signature_html, user_groups.user_group_icon, user_groups.user_group_name
FROM
(select * from comments where comment_content_id = 12 and active = 1) comments
INNER JOIN users u ON c.comment_user_id = users.user_id
INNER JOIN user_profile ON users.user_id = user_profile.user_id
INNER JOIN user_groups ON users.user_group_id = user_groups.user_group_id
ORDER BY comments.comment_time_added ASC
LIMIT 20

Try using left joins on the non null relations.
It seems that since inner joins are always symmetric mysql will reorder the joins to use best looking (typically smallest) table first.
Since left joins aren't always symmetric mysql won't reorder them and thus you can use them to force the table order. However with a non null field left and inner are equivalent so your results won't change.
The table order will determine what indicies are used which can greatly impact performance.

Related

How to optimize query by SUM of relations?

I have 3 simple tables
Invoices ( ~500k records )
Invoice items, one-to-many relation to invoices ( ~10 million records )
Invoice payments, one-to-many relation to invoices ( ~700k records )
Now, as simple as it sounds, I need to query for unpaid invoices.
Here is the query I am using:
select * from invoices
LEFT JOIN (SELECT invoice_id, SUM(price) as totalAmount
FROM invoice_items
GROUP BY invoice_id) AS t1
ON t1.invoice_id = invoices.id
LEFT JOIN (SELECT invoice_id, SUM(payed_amount) as totalPaid
FROM invoice_payment_transactions
GROUP BY invoice_id) AS t2
ON t2.invoice_id = invoices.id
WHERE totalAmount > totalPaid
Unfortunately, this query takes around 30 seconds, so way to slow.
Of course I have indexes set for "invoice_id" on both payments and items.
When I "EXPLAIN" the query, I can see that mysql has to do a full table scan.
I also tried several other query approaches, using "EXISTS" or "IN" with subqueries, but I never got around the full table scan.
Pretty sure there is not much that can be done here ( except use some caching approach ), but maybe someone knows how to optimize this ?
I need this query to run in a +/-2 seconds max.
EDIT:
Thanks to everybody for trying. Please just know that I absolutely know how to adopt different caching strategies here, but this question is purely about optimizing this query !
Here are the ( simplified ) table definitions
CREATE TABLE `invoices`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`date` date NOT NULL,
`title` enum ('M','F','Other') DEFAULT NULL,
`first_name` varchar(191) DEFAULT NULL,
`family_name` varchar(191) DEFAULT NULL,
`street` varchar(191) NOT NULL,
`postal_code` varchar(10) NOT NULL,
`city` varchar(191) NOT NULL,
`country` varchar(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`)
) ENGINE = InnoDB
CREATE TABLE `invoice_items`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`name` varchar(191) DEFAULT NULL,
`description` text DEFAULT NULL,
`reference` varchar(191) DEFAULT NULL,
`quantity` smallint(6) NOT NULL,
`price` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `invoice_items_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
CREATE TABLE `invoice_payment_transactions`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`transaction_identifier` varchar(191) NOT NULL,
`payed_amount` mediumint(9) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `invoice_payment_transactions_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
Plan A:
Summary table by invoice_id and day. (as Bill suggested) Summary Tables
Plan B:
Change the design to be "current" and "history". This is where the "payments" is a "history" of money changing hands. Meanwhile "invoices" would be "current" in that it contains a "balance_owed" column. This is a philosophy change; it could (should) be encapsulated in a client subroutine and/or a database Stored Procedure.
Plan C: This may be useful if "most" of the invoices are paid off.
Have a flag in the invoices table to indicate paid-off. That will prevent "most" of the JOINs from occurring. (Well, adding that column is just as hard as doing Plan B.)

Mysql Partitioning Query Performance

i have created partitions on pricing table. below is the alter statement.
ALTER TABLE `price_tbl`
PARTITION BY HASH(man_code)
PARTITIONS 87;
one partition consists of 435510 records. total records in price_tbl is 6 million.
EXPLAIN query showing only one partion is used for the query . Still the query takes 3-4 sec to execute. below is the query
EXPLAIN SELECT vrimg.image_cap_id,vm.man_name,vr.range_code,vr.range_name,vr.range_url, MIN(`finance_rental`) AS from_price, vd.der_id AS vehicle_id FROM `range_tbl` vr
LEFT JOIN `image_tbl` vrimg ON vr.man_code = vrimg.man_code AND vr.type_id = vrimg.type_id AND vr.range_code = vrimg.range_code
LEFT JOIN `manufacturer_tbl` vm ON vr.man_code = vm.man_code AND vr.type_id = vm.type_id
LEFT JOIN `derivative_tbl` vd ON vd.man_code=vm.man_code AND vd.type_id = vr.type_id AND vd.range_code=vr.range_code
LEFT JOIN `price_tbl` vp ON vp.vehicle_id = vd.der_id AND vd.type_id = vp.type_id AND vp.product_type_id=1 AND vp.maintenance_flag='N' AND vp.man_code=164
AND vp.initial_rentals_id =(SELECT rental_id FROM `rentals_tbl` WHERE rental_months='9')
AND vp.annual_mileage_id =(SELECT annual_mileage_id FROM `mileage_tbl` WHERE annual_mileage='8000')
WHERE vr.type_id = 1 AND vm.man_url = 'audi' AND vd.type_id IS NOT NULL GROUP BY vd.der_id
Result of EXPLAIN.
Same query without partitioning takes 3-4 sec.
Query with partitioning takes 2-3 sec.
how we can increase query performance as it is too slow yet.
attached create table structure.
price table - This consists 6 million records
CREATE TABLE `price_tbl` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`lender_id` bigint(20) DEFAULT NULL,
`type_id` bigint(20) NOT NULL,
`man_code` bigint(20) NOT NULL,
`vehicle_id` bigint(20) DEFAULT NULL,
`product_type_id` bigint(20) DEFAULT NULL,
`initial_rentals_id` bigint(20) DEFAULT NULL,
`term_id` bigint(20) DEFAULT NULL,
`annual_mileage_id` bigint(20) DEFAULT NULL,
`ref` varchar(255) DEFAULT NULL,
`maintenance_flag` enum('Y','N') DEFAULT NULL,
`finance_rental` decimal(20,2) DEFAULT NULL,
`monthly_rental` decimal(20,2) DEFAULT NULL,
`maintenance_payment` decimal(20,2) DEFAULT NULL,
`initial_payment` decimal(20,2) DEFAULT NULL,
`doc_fee` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`,`type_id`,`man_code`),
KEY `type_id` (`type_id`),
KEY `vehicle_id` (`vehicle_id`),
KEY `term_id` (`term_id`),
KEY `product_type_id` (`product_type_id`),
KEY `finance_rental` (`finance_rental`),
KEY `type_id_2` (`type_id`,`vehicle_id`),
KEY `maintenanace_idx` (`maintenance_flag`),
KEY `lender_idx` (`lender_id`),
KEY `initial_idx` (`initial_rentals_id`),
KEY `man_code_idx` (`man_code`)
) ENGINE=InnoDB AUTO_INCREMENT=5830708 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (man_code)
PARTITIONS 87 */
derivative table - This consists 18k records.
CREATE TABLE `derivative_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`der_cap_code` varchar(20) DEFAULT NULL,
`der_id` bigint(20) DEFAULT NULL,
`body_style_id` bigint(20) DEFAULT NULL,
`fuel_type_id` bigint(20) DEFAULT NULL,
`trans_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`model_code` bigint(20) DEFAULT NULL,
`der_name` varchar(255) DEFAULT NULL,
`der_url` varchar(255) DEFAULT NULL,
`der_intro_year` date DEFAULT NULL,
`der_disc_year` date DEFAULT NULL,
`der_last_spec_date` date DEFAULT NULL,
KEY `der_id` (`der_id`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`),
KEY `range_code` (`range_code`),
KEY `model_code` (`model_code`),
KEY `body_idx` (`body_style_id`),
KEY `capcodeidx` (`der_cap_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
range table - This consists 1k records
CREATE TABLE `range_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`range_name` varchar(255) DEFAULT NULL,
`range_url` varchar(255) DEFAULT NULL,
KEY `range_code` (`range_code`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY HASH is essentially useless if you are hoping for improved performance. BY RANGE is useful in a few use cases_.
In most situations, improvements in indexes are as good as trying to use partitioning.
Some likely problems:
No explicit PRIMARY KEY for InnoDB tables. Add a natural PK, if applicable, else an AUTO_INCREMENT.
No "composite" indexes -- they often provide a performance boost. Example: The LEFT JOIN between vr and vrimg involves 3 columns; a composite index on those 3 columns in the 'right' table will probably help performance.
Blind use of BIGINT when smaller datatypes would work. (This is an I/O issue when the table is big.)
Blind use of 255 in VARCHAR.
Consider whether most of the columns should be NOT NULL.
That query may be a victim of the "explode-implode" syndrome. This is where you do JOIN(s), which create a big intermediate table, followed by a GROUP BY to bring the row-count back down.
Don't use LEFT unless the 'right' table really is optional. (I see LEFT JOIN vd ... vd.type_id IS NOT NULL.)
Don't normalize "continuous" values (annual_mileage and rental_months). It is not really beneficial for "=" tests, and it severely hurts performance for "range" tests.
Same query without partitioning takes 3-4 sec. Query with partitioning takes 2-3 sec.
The indexes almost always need changing when switching between partitioning and non-partitioning. With the optimal indexes for each case, I predict that performance will be close to the same.
Indexes
These should help performance whether or not it is partitioned:
vm: (man_url)
vr: (man_code, type_id) -- either order
vd: (man_code, type_id, range_code, der_id)
-- `der_id` 4th, else in any order (covering)
vrimg: (man_code, type_id, range_code, image_cap_id)
-- `image_cap_id` 4th, else in any order (covering)
vp: (type_id, der_id, product_type_id, maintenance_flag,
initial_rentals, annual_mileage, man_code)
-- any order (covering)
A "covering" index is an extra boost, in that it can do all the work just in the index's BTree, without touching the data's BTree.
Implement a bunch of what I recommend, then come back (in another Question) for further tweaking.
Usually the "partition key" should be last in a composite index.

mysql query is slow, adding indexes doesnt work

I have a table with 100k+ rows but my queries are slow (they take about 3 seconds).
I tried making an index like this but this doesn't seem to do anything.
ALTER TABLE pm ADD INDEX (sender,reciever)
This is my query:
SELECT id,message FROM pm WHERE reciever = '28075' OR sender = '28075'
That takes 3 seconds more or less.
Explain of table PM
''
EXPLAIN of the query:
SHOW CREATE TABLE PM:
`CREATE TABLE `pm` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`datetime` int(11) NOT NULL,
`sender` int(11) NOT NULL,
`reciever` int(11) NOT NULL,
`users` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`readm` int(11) NOT NULL DEFAULT '0',
`forOp` int(11) NOT NULL DEFAULT '0',
`bussy` int(11) NOT NULL DEFAULT '0',
`bericht` longtext CHARACTER SET utf8mb4,
`aantal` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `users` (`users`(191)),
KEY `sender` (`sender`,`reciever`)
) ENGINE=InnoDB AUTO_INCREMENT=1637118 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci`
The reason the query could not use an index is that it uses OR, and your index can't be used to match the receiver (as a compound index requires that you match the leftmost column before matching the second one)
MySQL 5 added an index_merge which allows using multiple indexes for the same query, so if you have separate indexes on sender and receiver it could pick those.
An alternative would be to rewrite the query to use UNION and again use separate indexes instead of compound one:
SELECT id,message FROM pm WHERE reciever = '28075'
UNION
SELECT id,message FROM pm WHERE sender = '28075'
You can read more at this article

MySQL optimize count query

I've got a question about MySQL performance.
These are my tables:
(about 140.000 records)
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`label` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`title` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`intro` text COLLATE utf8_unicode_ci NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`date` int(11) NOT NULL,
`active` int(1) NOT NULL,
`language_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`indexed` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=132911 ;
(about 400.000 records)
CREATE TABLE IF NOT EXISTS `article_category` (
`article_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
RUNNING THIS COUNT QUERY:
SELECT SQL_NO_CACHE COUNT(id) as total
FROM (`article`)
LEFT JOIN `article_category` ON `article_category`.`article_id` = `article`.`id`
WHERE `article`.`language_id` = 1
AND `article_category`.`category_id` = '<catid>'
This query takes a lot of resources, so I am wondering how to optimize this query.
After executing it's beeing cached, so after the first run I am fine.
RUNNING THE EXPLAIN FUNCTION:
AFTER CREATING AN INDEX:
ALTER TABLE `article_category` ADD INDEX ( `article_id` , `category_id` ) ;
After adding indexes and changing LEFT JOIN to JOIN the query runs alot faster!
Thanks for these fast replys :)
QUERY I USE NOW (I removed the language_id because it was not that neccesary):
SELECT COUNT(id) as total
FROM (`article`)
JOIN `article_category` ON `article_category`.`article_id` = `article`.`id`
AND `article_category`.`category_id` = '<catid>'
I've read something about forcing an index, but I think thats not neccesary anymore because the tables are already indexed, right?
Thanks alot!
Martijn
You haven't created necessary index on the table
Table article_category - Create a compound index on (article_id, category_id)
Table article -Create a compound index on (id, language_id)
If this doesn't help post the explain statement.
The columns used in a JOIN condition should have an index, so you need to index article_id.

How to optimize this mysql join on large table?

I have a project where the admin needs to create multiple newsletters with some crawled posts from the web.
I insert the posts in posts table after crawling has completed and assign them a feed_id to identify the source. this is the structure of posts table (truncated):
CREATE TABLE `posts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`identifier` varchar(255) DEFAULT NULL,
`published` timestamp NULL DEFAULT NULL,
`content` longtext,
...
...
`is_unread` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Every admin (user) has access to one or more "feeds". So in Newsletter creation page I want to show them a list of posts from the feeds they are allowed to see and also, I show a button to put the posts in specifict categories of that newsletter, if the user previously selected that post, I should show him that and let him remove it from the category. So I have some other tables too: newsletters, categories, newsletter_post, category_post. Here is their structures:
newsletters:
CREATE TABLE `newsletters` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`sent_at` timestamp NULL DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`date` date DEFAULT NULL,
`topic_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
categories:
CREATE TABLE `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`topic_id` int(11) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
newsletter_post:
CREATE TABLE `newsletter_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`newsletter_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
category_post:
CREATE TABLE `category_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`category_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So I'm using this query to find posts for the allowed feeds and check the status if a post is in a specific category of this specific newsletter:
SELECT DISTINCT `posts`.`id`, `published`, `posts`.`title`, `posts`.`content`, `source_name`, `category_id`, `newsletter_id`, `link_href`, categories.title as category_title
FROM `posts`
LEFT JOIN `category_post` ON `posts`.`id` = `category_post`.`post_id`
LEFT JOIN `categories` ON `categories`.`id` = `category_post`.`category_id`
LEFT JOIN `newsletter_post` ON `posts`.`id` = `newsletter_post`.`post_id`
LEFT JOIN `newsletters` ON `newsletters`.`id` = `newsletter_post`.`newsletter_id`
WHERE `feed_id` IN (6, 7) ORDER BY `posts`.`published` DESC LIMIT 40 OFFSET 0
but the problem is this is horrible and not optimized. My posts table contains up to 50,000 rows each month, and each row with 3~10kbs of data in avg., so sometimes when I try to run the query (which is frequently run by the admin to make the newsletter, pagination etc) mysql shows this error: too much rows to join, etc. and most of the times its really slow.
and the reason I'm doing all this in one query is because I want the result to be in one json response so I can show them the user quickly without doing additional requests.
I wanna know if there is a better way to do this query or use indexes or something else.
Thanks you in advance for your help.
index your posts table on
( feed_id, published )
so the data is already optimized for your WHERE clause, and pre-sorted to help your ORDER BY.
For reading querys that have a lot of demand, InnoDB is very inefficient. I recommend you to use a NoSQL Database but if you don't want or the cost of change is too much... you can try this:
1) LIKE Sallar Kaboli told you, you have to index your tables in columns that use in JOIN querys. For example:
CREATE INDEX index1 ON newsletter_post (post_id);
2) USE only important columns for JOINS.
I mean, you have to only use the columns that use in SELECT part of query.
I hope this'd be helpful.
To complete other answers, I suggest to change this types on posts table:
1) Change feed_id to int(4). Really you have more than int(4) feeds?
2) Change is_unread to bit instead of int(1). I should say that this may not improve your given query in the question but according to the field name, the correct type is bit.
Another more improvement to this answer is that never use default int(11) for numeric or id fields, assign types more specific. Using smaller size of types will improve your indexes also. I don't think you need more than int(4) for fields id.
For example indexing and querying int(3) column is more faster than int(11).
Please create the following indexes indexes on ::
1) `post_id` in `category_post`
2) `post_id` in `newsletter_post`