How to optimize this mysql join on large table? - mysql

I have a project where the admin needs to create multiple newsletters with some crawled posts from the web.
I insert the posts in posts table after crawling has completed and assign them a feed_id to identify the source. this is the structure of posts table (truncated):
CREATE TABLE `posts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`identifier` varchar(255) DEFAULT NULL,
`published` timestamp NULL DEFAULT NULL,
`content` longtext,
...
...
`is_unread` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Every admin (user) has access to one or more "feeds". So in Newsletter creation page I want to show them a list of posts from the feeds they are allowed to see and also, I show a button to put the posts in specifict categories of that newsletter, if the user previously selected that post, I should show him that and let him remove it from the category. So I have some other tables too: newsletters, categories, newsletter_post, category_post. Here is their structures:
newsletters:
CREATE TABLE `newsletters` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`sent_at` timestamp NULL DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`date` date DEFAULT NULL,
`topic_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
categories:
CREATE TABLE `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`topic_id` int(11) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
newsletter_post:
CREATE TABLE `newsletter_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`newsletter_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
category_post:
CREATE TABLE `category_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`category_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So I'm using this query to find posts for the allowed feeds and check the status if a post is in a specific category of this specific newsletter:
SELECT DISTINCT `posts`.`id`, `published`, `posts`.`title`, `posts`.`content`, `source_name`, `category_id`, `newsletter_id`, `link_href`, categories.title as category_title
FROM `posts`
LEFT JOIN `category_post` ON `posts`.`id` = `category_post`.`post_id`
LEFT JOIN `categories` ON `categories`.`id` = `category_post`.`category_id`
LEFT JOIN `newsletter_post` ON `posts`.`id` = `newsletter_post`.`post_id`
LEFT JOIN `newsletters` ON `newsletters`.`id` = `newsletter_post`.`newsletter_id`
WHERE `feed_id` IN (6, 7) ORDER BY `posts`.`published` DESC LIMIT 40 OFFSET 0
but the problem is this is horrible and not optimized. My posts table contains up to 50,000 rows each month, and each row with 3~10kbs of data in avg., so sometimes when I try to run the query (which is frequently run by the admin to make the newsletter, pagination etc) mysql shows this error: too much rows to join, etc. and most of the times its really slow.
and the reason I'm doing all this in one query is because I want the result to be in one json response so I can show them the user quickly without doing additional requests.
I wanna know if there is a better way to do this query or use indexes or something else.
Thanks you in advance for your help.

index your posts table on
( feed_id, published )
so the data is already optimized for your WHERE clause, and pre-sorted to help your ORDER BY.

For reading querys that have a lot of demand, InnoDB is very inefficient. I recommend you to use a NoSQL Database but if you don't want or the cost of change is too much... you can try this:
1) LIKE Sallar Kaboli told you, you have to index your tables in columns that use in JOIN querys. For example:
CREATE INDEX index1 ON newsletter_post (post_id);
2) USE only important columns for JOINS.
I mean, you have to only use the columns that use in SELECT part of query.
I hope this'd be helpful.

To complete other answers, I suggest to change this types on posts table:
1) Change feed_id to int(4). Really you have more than int(4) feeds?
2) Change is_unread to bit instead of int(1). I should say that this may not improve your given query in the question but according to the field name, the correct type is bit.
Another more improvement to this answer is that never use default int(11) for numeric or id fields, assign types more specific. Using smaller size of types will improve your indexes also. I don't think you need more than int(4) for fields id.
For example indexing and querying int(3) column is more faster than int(11).

Please create the following indexes indexes on ::
1) `post_id` in `category_post`
2) `post_id` in `newsletter_post`

Related

How to optimize query by SUM of relations?

I have 3 simple tables
Invoices ( ~500k records )
Invoice items, one-to-many relation to invoices ( ~10 million records )
Invoice payments, one-to-many relation to invoices ( ~700k records )
Now, as simple as it sounds, I need to query for unpaid invoices.
Here is the query I am using:
select * from invoices
LEFT JOIN (SELECT invoice_id, SUM(price) as totalAmount
FROM invoice_items
GROUP BY invoice_id) AS t1
ON t1.invoice_id = invoices.id
LEFT JOIN (SELECT invoice_id, SUM(payed_amount) as totalPaid
FROM invoice_payment_transactions
GROUP BY invoice_id) AS t2
ON t2.invoice_id = invoices.id
WHERE totalAmount > totalPaid
Unfortunately, this query takes around 30 seconds, so way to slow.
Of course I have indexes set for "invoice_id" on both payments and items.
When I "EXPLAIN" the query, I can see that mysql has to do a full table scan.
I also tried several other query approaches, using "EXISTS" or "IN" with subqueries, but I never got around the full table scan.
Pretty sure there is not much that can be done here ( except use some caching approach ), but maybe someone knows how to optimize this ?
I need this query to run in a +/-2 seconds max.
EDIT:
Thanks to everybody for trying. Please just know that I absolutely know how to adopt different caching strategies here, but this question is purely about optimizing this query !
Here are the ( simplified ) table definitions
CREATE TABLE `invoices`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`date` date NOT NULL,
`title` enum ('M','F','Other') DEFAULT NULL,
`first_name` varchar(191) DEFAULT NULL,
`family_name` varchar(191) DEFAULT NULL,
`street` varchar(191) NOT NULL,
`postal_code` varchar(10) NOT NULL,
`city` varchar(191) NOT NULL,
`country` varchar(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`)
) ENGINE = InnoDB
CREATE TABLE `invoice_items`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`name` varchar(191) DEFAULT NULL,
`description` text DEFAULT NULL,
`reference` varchar(191) DEFAULT NULL,
`quantity` smallint(6) NOT NULL,
`price` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `invoice_items_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
CREATE TABLE `invoice_payment_transactions`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`transaction_identifier` varchar(191) NOT NULL,
`payed_amount` mediumint(9) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `invoice_payment_transactions_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
Plan A:
Summary table by invoice_id and day. (as Bill suggested) Summary Tables
Plan B:
Change the design to be "current" and "history". This is where the "payments" is a "history" of money changing hands. Meanwhile "invoices" would be "current" in that it contains a "balance_owed" column. This is a philosophy change; it could (should) be encapsulated in a client subroutine and/or a database Stored Procedure.
Plan C: This may be useful if "most" of the invoices are paid off.
Have a flag in the invoices table to indicate paid-off. That will prevent "most" of the JOINs from occurring. (Well, adding that column is just as hard as doing Plan B.)

Slow search query with a one to many join

My problem is a slow search query with a one-to-many relationship between the tables. My tables look like this.
Table Assignment
CREATE TABLE `Assignment` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ProjectId` int(10) unsigned NOT NULL,
`AssignmentTypeId` smallint(5) unsigned NOT NULL,
`AssignmentNumber` varchar(30) NOT NULL,
`AssignmentNumberExternal` varchar(50) DEFAULT NULL,
`DateStart` datetime DEFAULT NULL,
`DateEnd` datetime DEFAULT NULL,
`DateDeadline` datetime DEFAULT NULL,
`DateCreated` datetime DEFAULT NULL,
`Deleted` datetime DEFAULT NULL,
`Lat` double DEFAULT NULL,
`Lon` double DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `idx_assignment_assignment_type_id` (`AssignmentTypeId`),
KEY `idx_assignment_assignment_number` (`AssignmentNumber`),
KEY `idx_assignment_assignment_number_external`
(`AssignmentNumberExternal`)
) ENGINE=InnoDB AUTO_INCREMENT=5280 DEFAULT CHARSET=utf8;
Table ExtraFields
CREATE TABLE `ExtraFields` (
`assignment_id` int(10) unsigned NOT NULL,
`name` varchar(30) NOT NULL,
`value` text,
PRIMARY KEY (`assignment_id`,`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
My search query
SELECT
`Assignment`.`Id`, COL_5_72, COL_5_73, COL_5_74, COL_5_75, COL_5_76,
COL_5_77 FROM (
SELECT
`Assignment`.`Id`,
`Assignment`.`AssignmentNumber` AS COL_5_72,
`Assignment`.`AssignmentNumberExternal` AS COL_5_73 ,
`AssignmentType`.`Name` AS COL_5_74,
`Assignment`.`DateStart` AS COL_5_75,
`Assignment`.`DateEnd` AS COL_5_76,
`Assignment`.`DateDeadline` AS COL_5_77 FROM `Assignment`
CASE WHEN `ExtraField`.`Name` = "WorkDistrict" THEN
`ExtraField`.`Value` end as COL_5_78 FROM `Assignment`
LEFT JOIN `ExtraFields` as `ExtraField` on
`ExtraField`.`assignment_id` = `Assignment`.`Id`
WHERE `Assignment`.`Deleted` IS NULL -- Assignment should not be removed.
AND (1=1) -- Add assignment filters.
) AS q1
GROUP BY `Assignment`.`Id`
HAVING 1 = 1
AND COL_5_78 LIKE '%Amsterdam East%'
ORDER BY COL_5_72 ASC, COL_5_73 ASC;
When the table is only around 3500 records my query takes a couple of seconds to execute and return the results.
What is a better way to search in the related data? Should I just add a JSON field to the Assignment table and use the MySQL 5.7 Json query features? Or did I made a mistake in designing my database?
You are using select from subquery that forces MySQL to create unindexed temp table for each execution. Remove subquery (you really don't need it here) and it will be much faster.

Avoid UNION for two almost identical tables in MySQL

I'm not very good at MySQL and i'm going to write a query to count messages sent by an user, based on its type and is_auto field.
Messages can be of type "small text message" or "newsletter". I created two entities with a few fields that differs between them. The important one is messages_count that is absent in table newsletter and it's used in the query:
CREATE TABLE IF NOT EXISTS `small_text_message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
And:
CREATE TABLE `newsletter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`subject` varchar(78) DEFAULT NULL,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
I ended up with a UNION query. Can this query be shortened or optimized since the only difference is messages_count that should be always 1 for newsletter?
SELECT
CONCAT('sms_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(messages_count * (customers_count + recipients_count)) AS count
FROM small_text_message WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
UNION
SELECT
CONCAT('newsletter_', IF(is_auto = 0, 'user' , 'auto')) AS subtype,
SUM(customers_count + recipients_count) AS count
FROM newsletter WHERE status <> 'pending' AND user_id = 1
GROUP BY is_auto
I don't see any easy way to avoid a UNION (or UNION ALL) operation, that will return the specified result set.
I would recommend you use a UNION ALL operator in place of the UNION operator. Then the execution plan will not include the step that eliminates duplicate rows. (You already have GROUP BY operations on each query, and there is no way that those two queries can produce an identical row.)
Otherwise, your query looks fine just as it is written.
(It's always a good thing to consider the question, might there be a better way? To get the result set you are asking for, from the schema you have, your query looks about as good as it's going to get.)
If you are looking for more general DB advice, I recommend restructuring the tables to factor the common elements into one table, perhaps called outbound_communication or something, with all of your common fields, then perhaps have "sub tables" for the specific types to host the fields which are unique to that type. It does mean a simple JOIN is necessary to select all of the fields you want, but the again, it's normalized and actually makes situations like this one easier (one table holds all of the entities of interest). Additionally, you have the option of writing that JOIN just once as a "view", and then your existing code would not even need to change to see the two tables as if they never changed.
CREATE TABLE IF NOT EXISTS `outbound_communicaton` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` longtext,
`sent_at` datetime DEFAULT NULL,
`status` varchar(255) NOT NULL,
`recipients_count` int(11) NOT NULL,
`customers_count` int(11) NOT NULL,
`sheduled_at` datetime DEFAULT NULL,
`sheduled_for` datetime DEFAULT NULL,
`is_auto` tinyint(1) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `small_text_message` (
`oubound_communication_id` int(11) NOT NULL,
`messages_count` int(11) NOT NULL,
`username` varchar(255) NOT NULL,
`method` varchar(255) NOT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
CREATE TABLE `newsletter` (
`oubound_communication_id` int(11) NOT NULL,
`subject` varchar(78) DEFAULT NULL,
PRIMARY KEY (`outbound_communication_id`),
FOREIGN KEY (outbound_communication_id)
REFERENCES outbound_communicaton(id)
) ENGINE=InnoDB;
Then selecting a text msg is like this:
SELECT *
FROM outbound_communication AS parent
JOIN small_text_message
ON parent.id = small_text_message.outbound_communication_id
WHERE parent.id = 1234;
The nature of the query is inherently the union of the data from the small text message and the newsletter tables, so the UNION query is the only realistic formulation. There's no join of relevance between the two tables, for example.
So, I think you're very much on the right lines with your query.
Why are you worried about a UNION?

Optimizing MySQL query with expensive INNER JOIN

Using trial and error i've discovered that when removing a join from the below query it runs around 30 times quicker. Can someone explain why this would be and if it's possible to optimise the query to include the additional join without the performance hit.
This is a screenshot of the explain which shows that the index isn't being used for the uesr_groups table.
http://i.imgur.com/9VDuV.png
This is the original query:
SELECT `comments`.`comment_id`, `comments`.`comment_html`, `comments`.`comment_time_added`, `comments`.`comment_has_attachments`, `users`.`user_name`, `users`.`user_id`, `users`.`user_comments_count`, `users`.`user_time_registered`, `users`.`user_time_last_active`, `user_profile`.`user_avatar`, `user_profile`.`user_signature_html`, `user_groups`.`user_group_icon`, `user_groups`.`user_group_name`
FROM (`comments`)
INNER JOIN `users` ON `comments`.`comment_user_id` = `users`.`user_id`
INNER JOIN `user_profile` ON `users`.`user_id` = `user_profile`.`user_id`
INNER JOIN `user_groups` ON `users`.`user_group_id` = `user_groups`.`user_group_id`
WHERE `comments`.`comment_enabled` = 1
AND `comments`.`comment_content_id` = 12
ORDER BY `comments`.`comment_time_added` ASC
LIMIT 20
If I remove the "user_groups" join then the query runs 30 times quicker as mentioned above.
SELECT `comments`.`comment_id`, `comments`.`comment_html`, `comments`.`comment_time_added`, `comments`.`comment_has_attachments`, `users`.`user_name`, `users`.`user_id`, `users`.`user_comments_count`, `users`.`user_time_registered`, `users`.`user_time_last_active`, `user_profile`.`user_avatar`, `user_profile`.`user_signature_html`
FROM (`comments`)
INNER JOIN `users` ON `comments`.`comment_user_id` = `users`.`user_id`
INNER JOIN `user_profile` ON `users`.`user_id` = `user_profile`.`user_id`
WHERE `comments`.`comment_enabled` = 1
AND `comments`.`comment_content_id` = 12
ORDER BY `comments`.`comment_time_added` ASC
LIMIT 20
My tables are below, can anyone offer any insight into how to avoid a performance hit for including the user_groups table?
--
-- Table structure for table `comments`
--
CREATE TABLE IF NOT EXISTS `comments` (
`comment_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`comment_content_id` int(10) unsigned NOT NULL,
`comment_user_id` mediumint(6) unsigned NOT NULL,
`comment_original` text NOT NULL,
`comment_html` text NOT NULL,
`comment_time_added` int(10) unsigned NOT NULL,
`comment_time_updated` int(10) unsigned NOT NULL,
`comment_enabled` tinyint(1) NOT NULL DEFAULT '0',
`comment_is_spam` tinyint(1) NOT NULL DEFAULT '0',
`comment_has_attachments` tinyint(1) unsigned NOT NULL,
`comment_has_edits` tinyint(1) NOT NULL,
PRIMARY KEY (`comment_id`),
KEY `comment_user_id` (`comment_user_id`),
KEY `comment_content_id` (`comment_content_id`),
KEY `comment_is_spam` (`comment_is_spam`),
KEY `comment_enabled` (`comment_enabled`),
KEY `comment_time_updated` (`comment_time_updated`),
KEY `comment_time_added` (`comment_time_added`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=352 ;
-- --------------------------------------------------------
--
-- Table structure for table `users`
--
CREATE TABLE IF NOT EXISTS `users` (
`user_id` mediumint(6) unsigned NOT NULL AUTO_INCREMENT,
`user_ipb_id` int(10) unsigned DEFAULT NULL,
`user_activated` tinyint(1) NOT NULL DEFAULT '0',
`user_name` varchar(64) CHARACTER SET latin1 NOT NULL,
`user_email` varchar(255) NOT NULL,
`user_password` varchar(40) NOT NULL,
`user_content_count` int(10) unsigned NOT NULL DEFAULT '0',
`user_comments_count` int(10) unsigned NOT NULL DEFAULT '0',
`user_salt` varchar(8) NOT NULL,
`user_api_key` varchar(32) NOT NULL,
`user_auth_key` varchar(32) DEFAULT NULL,
`user_paypal_key` varchar(32) DEFAULT NULL,
`user_timezone_id` smallint(3) unsigned NOT NULL,
`user_group_id` tinyint(3) unsigned NOT NULL,
`user_custom_permission_mask_id` tinyint(3) unsigned DEFAULT NULL,
`user_lang_id` tinyint(2) unsigned NOT NULL,
`user_time_registered` int(10) unsigned NOT NULL,
`user_time_last_active` int(10) unsigned NOT NULL
PRIMARY KEY (`user_id`),
UNIQUE KEY `user_email` (`user_email`),
KEY `user_group_id` (`user_group_id`),
KEY `user_auth_key` (`user_auth_key`),
KEY `user_api_key` (`user_api_key`),
KEY `user_custom_permission_mask_id` (`user_custom_permission_mask_id`),
KEY `user_time_last_active` (`user_time_last_active`),
KEY `user_paypal_key` (`user_paypal_key`),
KEY `user_name` (`user_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=33 ;
-- --------------------------------------------------------
--
-- Table structure for table `user_groups`
--
CREATE TABLE IF NOT EXISTS `user_groups` (
`user_group_id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
`user_group_name` varchar(32) NOT NULL,
`user_group_permission_mask_id` tinyint(3) unsigned NOT NULL,
`user_group_icon` varchar(32) DEFAULT NULL,
PRIMARY KEY (`user_group_id`),
KEY `user_group_permission_mask_id` (`user_group_permission_mask_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=8 ;
-- --------------------------------------------------------
--
-- Table structure for table `user_profile`
--
CREATE TABLE IF NOT EXISTS `user_profile` (
`user_id` mediumint(8) unsigned NOT NULL,
`user_signature_original` text,
`user_signature_html` text,
`user_avatar` varchar(64) DEFAULT NULL,
`user_steam_id` varchar(64) DEFAULT NULL,
`user_ps_id` varchar(16) DEFAULT NULL,
`user_xbox_id` varchar(64) DEFAULT NULL,
`user_wii_id` varchar(64) DEFAULT NULL,
PRIMARY KEY (`user_id`),
KEY `user_steam_id` (`user_steam_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Most database engines calculate their query plan based on statistics about the tables - for instance, if a table has a small number of rows, it's quicker to go to the table than the index. Those statistics are maintained during "normal" operation - e.g. inserts, updates and deletes - but can get out of sync when table definitions are changed, or when you do bulk inserts.
If you see unexpected behaviour in the query plan, you can force the database to update its statistics; in MySQL you can use Optimize Table - which does everything, including re-ordering the table itself, or Analyze Table which only updates the indices.
This is hard to do on production environments, as both operations lock the tables; if you can possibly negotiate a maintenance window, that's by far the simplest way to deal with the problem.
It's worth measuring performance of "optimize table" - on well-specified hardware, it should take only a couple of seconds for "normal" size tables (up to low millions of records, with only a few indices). That might mean you can have an "informal" maintenance window - you don't take the application off-line, you just accept that some users will have degraded performance while you're running the scripts.
MySQL has an EXPLAIN feature which will help you to understand the query:
$ mysql
> EXPLAIN SELECT `comments`.`comment_id`, `comments`.`comment_html`,`comments`.`comment_time_added`, `comments`.`comment_has_attachments`, `users`.`user_name`, `users`.`user_id`, `users`.`user_comments_count`, `users`.`user_time_registered`, `users`.`user_time_last_active`, `user_profile`.`user_avatar`, `user_profile`.`user_signature_html`
FROM (`comments`)
INNER JOIN `users` ON `comments`.`comment_user_id` = `users`.`user_id`
INNER JOIN `user_profile` ON `users`.`user_id` = `user_profile`.`user_id`
WHERE `comments`.`comment_enabled` = 1
AND `comments`.`comment_content_id` = 12
ORDER BY `comments`.`comment_time_added` ASC
LIMIT 20
MySQL might simply be missing, or skipping an index.
You can learn more about understanding the output of EXPLAIN here from the documentation (a little hard-core), or better yet from a simpler explanation here, (ignore the fact that it's on a Java site.)
More than likely the amount of data, or an outdated or incomplete index is meaning that MySQL is falsely doing a table scan. When you see table scans, or sequential serches, you can often easily see which field is missing an index, or an index which is not usable.
Could you please try this one (you can remove join with user_group ). It can be faster in case if query retrieve small data set from comments table:
SELECT
comments.comment_id, comments.comment_html, comments.comment_time_added, comments.comment_has_attachments, users.user_name, users.user_id, users.user_comments_count, users.user_time_registered, users.user_time_last_active, user_profile.user_avatar, user_profile.user_signature_html, user_groups.user_group_icon, user_groups.user_group_name
FROM
(select * from comments where comment_content_id = 12 and active = 1) comments
INNER JOIN users u ON c.comment_user_id = users.user_id
INNER JOIN user_profile ON users.user_id = user_profile.user_id
INNER JOIN user_groups ON users.user_group_id = user_groups.user_group_id
ORDER BY comments.comment_time_added ASC
LIMIT 20
Try using left joins on the non null relations.
It seems that since inner joins are always symmetric mysql will reorder the joins to use best looking (typically smallest) table first.
Since left joins aren't always symmetric mysql won't reorder them and thus you can use them to force the table order. However with a non null field left and inner are equivalent so your results won't change.
The table order will determine what indicies are used which can greatly impact performance.

Database slow retriving/updating/inserting problem with more than 5mil records in each table

How to structure database to avoid slowdowns? (Engine: MyISAM)
Currently i have database with more than 5milion records in one table that causes slow data retrieving.
I'm currently searching for ways to structure database to avoid this kinds of database. (Database Engine MyISAM)
Tables that cause problems are posts and comments having more than 5mil records in each.
I had an idea when using text file as storage when saving records by date, so that each file contained enough data that wasn't slowing retrieving and saving processes, But with databases i don't know what to do :(
Is there any way to save data (approx 5mil records in each) in MySQL database not to cause slow retrieving, inserting or updating data?
"posts" Structure
CREATE TABLE IF NOT EXISTS `ibf_posts` (
`pid` int(10) NOT NULL auto_increment,
`append_edit` tinyint(1) default '0',
`edit_time` int(10) default NULL,
`author_id` mediumint(8) NOT NULL default '0',
`author_name` varchar(32) default NULL,
`use_sig` tinyint(1) NOT NULL default '0',
`use_emo` tinyint(1) NOT NULL default '0',
`ip_address` varchar(16) default NULL,
`post_date` int(10) default NULL,
`icon_id` smallint(3) default NULL,
`post` text,
`queued` tinyint(1) NOT NULL default '0',
`topic_id` int(10) NOT NULL default '0',
`post_title` varchar(255) default NULL,
`new_topic` tinyint(1) default '0',
`edit_name` varchar(255) default NULL,
`post_key` varchar(32) default NULL,
`post_parent` int(10) NOT NULL default '0',
`post_htmlstate` smallint(1) NOT NULL default '0',
`post_edit_reason` varchar(255) default NULL,
PRIMARY KEY (`pid`),
KEY `topic_id` (`topic_id`,`queued`,`pid`,`post_date`),
KEY `author_id` (`author_id`,`topic_id`),
KEY `post_date` (`post_date`),
KEY `ip_address` (`ip_address`),
KEY `post_key` (`post_key`),
FULLTEXT KEY `post` (`post`),
FULLTEXT KEY `post_2` (`post`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query:
SELECT p.*, pp.*,.id,m.name,m.mgroup,m.email,m.joined,m.posts, m.last_visit, m.last_activity,m.login_anonymous,m.title,m.hide_email, m.warn_level, m.warn_lastwarn, m.points, m.topics_started, m.skin,
me.msnname,me.aim_name,me.icq_number,me.signature, me.website,me.yahoo,me.location, me.avatar_location, me.avatar_type, me.avatar_size, m.members_display_name, m.custom_post_css, m.custom_right_img
m.custom_post_color
FROM posts p
LEFT JOIN members m ON (m.id=p.author_id)
LEFT JOIN profile_portal pp ON (m.id=pp.pp_member_id)
LEFT JOIN member_extra me ON (me.id=m.id)
WHERE p.pid IN(--post ids here)
ORDER BY --ordering here
5M is not that much.
Probably you indexed the table wrong.
Please post your query and we'll probably tell you how to improve it.
Update:
SELECT p.*, pp.*,.id,m.name,m.mgroup,m.email,m.joined,m.posts, m.last_visit, m.last_activity,m.login_anonymous,m.title,m.hide_email, m.warn_level, m.warn_lastwarn, m.points, m.topics_started, m.skin,
me.msnname,me.aim_name,me.icq_number,me.signature, me.website,me.yahoo,me.location, me.avatar_location, me.avatar_type, me.avatar_size, m.members_display_name, m.custom_post_css, m.custom_right_img
m.custom_post_color
FROM posts p
LEFT JOIN
members m
ON m.id = p.author_id
LEFT JOIN
profile_portal pp
ON pp.pp_member_id = m.id
LEFT JOIN
member_extra me
ON me.id = m.id
WHERE p.pid IN (--post ids here)
ORDER BY
--ordering here
Make sure that:
members.id is a PRIMARY KEY
member_extra.id is a PRIMARY KEY
You have an index on profile_portal.pp_member_id
Also you omitted the ORDER BY clause but this clause is important too, using indexes can improve it as well.
EXPLAIN PLAN will tell you how the query engine is doing it. If you see "table scan", you know you need indexes.
5M rows in one table is not that much, how long your queries are taking? I suspect you may have some problems with indexing. EXPLAIN statement may help to find out what you queries are actually doing.
If you have properly indexed tables and sane queries, you could look into partitioning..
Edit:
You could try if adding INDEX(pid, author_id) or INDEX(author_id, pid) on table ibf_posts helps.