mysql join not use index for 'between' operator - mysql

So basically I have three tables:
CREATE TABLE `cdIPAddressToLocation` (
`IPADDR_FROM` int(10) unsigned NOT NULL COMMENT 'Low end of the IP Address block',
`IPADDR_TO` int(10) unsigned NOT NULL COMMENT 'High end of the IP Address block',
`IPLOCID` int(10) unsigned NOT NULL COMMENT 'The Location ID for the IP Address range',
PRIMARY KEY (`IPADDR_TO`),
KEY `Index_2` USING BTREE (`IPLOCID`),
KEY `Index_3` USING BTREE (`IPADDR_FROM`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `cdIPLocation` (
`IPLOCID` int(10) unsigned NOT NULL default '0',
`Country` varchar(4) default NULL,
`Region` int(10) unsigned default NULL,
`City` varchar(90) default NULL,
`PostalCode` varchar(10) default NULL,
`Latitude` float NOT NULL,
`Longitude` float NOT NULL,
`MetroCode` varchar(4) default NULL,
`AreaCode` varchar(4) default NULL,
`State` varchar(45) default NULL,
`Continent` varchar(10) default NULL,
PRIMARY KEY (`IPLOCID`)
) ENGINE=MyISAM AUTO_INCREMENT=218611 DEFAULT CHARSET=latin1;
and
CREATE TABLE 'data'{
'IP' varchar(50)
'SCORE' int
}
My task is to join these three tables and find the location data for given IP address.
My query is as follows:
select
t.ip,
l.Country,
l.State,
l.City,
l.PostalCode,
l.Latitude,
l.Longitude,
t.score
from
(select
ip, inet_aton(ip) ipv, score
from
data
order by score desc
limit 5) t
join
cdIPAddressToLocation a ON t.ipv between a.IPADDR_FROM and a.IPADDR_TO
join
cdIPLocation l ON l.IPLOCID = a.IPLOCID
While this query works, it's very very slow, it took about 100 seconds to return the result on my dev box.
I'm using mysql 5.1, the cdIPAddressToLocation has 5.9 million rows and cdIPLocation table has about 0.3 million rows.
When I check the execution plan, I found it's not using any index in the table 'cdIPAddressToLocation', so for each row in the 'data' table it would do a full table scan against table 'cdIPAddressToLocation'.
It is very weird to me. I mean since there are already two indexes in table 'cdIPAddressToLocation' on columns 'IPADDR_FROM' and 'IPADDR_TO', the execution plan should exploit the index to improve the performance, but why it didn't use them.
Or was there something wrong with my query?
Please help, thanks a lot.

Have you tried using a composite index on the columns cdIPAddressToLocation.IPADDR_FROM and cdIPAddressToLocation.IPADDR_TO?
Multiple-Column Indexes

Related

How to optimize query by SUM of relations?

I have 3 simple tables
Invoices ( ~500k records )
Invoice items, one-to-many relation to invoices ( ~10 million records )
Invoice payments, one-to-many relation to invoices ( ~700k records )
Now, as simple as it sounds, I need to query for unpaid invoices.
Here is the query I am using:
select * from invoices
LEFT JOIN (SELECT invoice_id, SUM(price) as totalAmount
FROM invoice_items
GROUP BY invoice_id) AS t1
ON t1.invoice_id = invoices.id
LEFT JOIN (SELECT invoice_id, SUM(payed_amount) as totalPaid
FROM invoice_payment_transactions
GROUP BY invoice_id) AS t2
ON t2.invoice_id = invoices.id
WHERE totalAmount > totalPaid
Unfortunately, this query takes around 30 seconds, so way to slow.
Of course I have indexes set for "invoice_id" on both payments and items.
When I "EXPLAIN" the query, I can see that mysql has to do a full table scan.
I also tried several other query approaches, using "EXISTS" or "IN" with subqueries, but I never got around the full table scan.
Pretty sure there is not much that can be done here ( except use some caching approach ), but maybe someone knows how to optimize this ?
I need this query to run in a +/-2 seconds max.
EDIT:
Thanks to everybody for trying. Please just know that I absolutely know how to adopt different caching strategies here, but this question is purely about optimizing this query !
Here are the ( simplified ) table definitions
CREATE TABLE `invoices`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`date` date NOT NULL,
`title` enum ('M','F','Other') DEFAULT NULL,
`first_name` varchar(191) DEFAULT NULL,
`family_name` varchar(191) DEFAULT NULL,
`street` varchar(191) NOT NULL,
`postal_code` varchar(10) NOT NULL,
`city` varchar(191) NOT NULL,
`country` varchar(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`)
) ENGINE = InnoDB
CREATE TABLE `invoice_items`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`name` varchar(191) DEFAULT NULL,
`description` text DEFAULT NULL,
`reference` varchar(191) DEFAULT NULL,
`quantity` smallint(6) NOT NULL,
`price` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `invoice_items_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
CREATE TABLE `invoice_payment_transactions`
(
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`invoice_id` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`transaction_identifier` varchar(191) NOT NULL,
`payed_amount` mediumint(9) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `invoice_payment_transactions_invoice_id_index` (`invoice_id`),
) ENGINE = InnoDB
Plan A:
Summary table by invoice_id and day. (as Bill suggested) Summary Tables
Plan B:
Change the design to be "current" and "history". This is where the "payments" is a "history" of money changing hands. Meanwhile "invoices" would be "current" in that it contains a "balance_owed" column. This is a philosophy change; it could (should) be encapsulated in a client subroutine and/or a database Stored Procedure.
Plan C: This may be useful if "most" of the invoices are paid off.
Have a flag in the invoices table to indicate paid-off. That will prevent "most" of the JOINs from occurring. (Well, adding that column is just as hard as doing Plan B.)

MySQL Query Optimization that touches three tables via a union of two of them

I have a query that returns results from a single table based on the provided ID existing in a column in one of two, or both, tables. The DB schema for the relevant tables is provided below as well as the initial query and then what was later recommended to me by a peer. I go into some details below as to why this query works but I need to optimize it farther for larger datasets and pagination.
CREATE TABLE `killmails` (
`id` BIGINT(20) UNSIGNED NOT NULL,
`hash` VARCHAR(255) NOT NULL,
`moon_id` BIGINT(20) NULL DEFAULT NULL,
`solar_system_id` BIGINT(20) UNSIGNED NOT NULL,
`war_id` BIGINT(20) NULL DEFAULT NULL,
`is_npc` TINYINT(1) NOT NULL DEFAULT '0',
`is_awox` TINYINT(1) NOT NULL DEFAULT '0',
`is_solo` TINYINT(1) NOT NULL DEFAULT '0',
`dropped_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`destroyed_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`fitted_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`total_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`killmail_time` DATETIME NOT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`, `hash`),
INDEX `total_value` (`total_value`),
INDEX `killmail_time` (`killmail_time`),
INDEX `solar_system_id` (`solar_system_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_attackers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_done` BIGINT(20) UNSIGNED NOT NULL,
`final_blow` TINYINT(1) NOT NULL DEFAULT '0',
`security_status` DECIMAL(17,15) NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`weapon_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `weapon_type_id` (`weapon_type_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_attackers_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_victim` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_taken` BIGINT(20) UNSIGNED NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NOT NULL,
`ship_value` DECIMAL(18,4) NOT NULL DEFAULT '0.0000',
`pos_x` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_y` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_z` DECIMAL(30,10) NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_victim_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This first query is where the problem started:
SELECT
*
FROM
killmails k
LEFT JOIN killmail_attackers ka ON k.id = ka.killmail_id
LEFT JOIN killmail_victim kv ON k.id = kv.killmail_id
WHERE
ka.character_id = ?
OR kv.character_id = ?
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
This worked okay, but long query times. We optimized to this
SELECT
killmails.*,
FROM (
SELECT killmail_victim.killmail_id FROM killmail_victim
WHERE killmail_victim.corporation_id = ?
UNION
SELECT killmail_attackers.killmail_id FROM killmail_attackers
WHERE killmail_attackers.corporation_id = ?
) SELECTED_KMS
LEFT JOIN killmails ON killmails.id = SELECTED_KMS.killmail_id
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
I saw a huge improvement in query times when looking up killmails for characters, however when I started querying for larger datasets like corporation and alliance killmails, the query slows down. This is because the queries that are union'd together can potentially return large sets of data and the time it takes to read all that into memory so that the SELECTED_KMS table can be created is what I believe is taking so much time. Most of the time, with alliances, my connection to the database times out from the application. One alliance returned 900K killmailIDs from one of the union'd tables, not sure what the other returned.
I can easily add limit statements to the internal queries, but this will introduce a lot of complications when I get to paginating the data or when I introduce a feature to search for KMs by date for example.
I am looking for suggestions on how this query can be optimized and still allow for easy pagination in the near future.
Thank You
Change INDEX(corporation_id) in both tables to INDEX(corporation_id, killmail_id) so that the inner queries will be "covering".
In general, INDEX(a) is useless when you also have INDEX(a,b). Any query that needs just a, can use either of those indexes. (This rule does not apply to b; only the "leftmost" column(s).)
Where does killmails.id come from? It's not AUTO_INCREMENT; it is not alone in the PRIMARY KEY, so there is no specified "uniqueness" constraint. Is it unique by some other design? Is it computed somewhere else in the code? (I ask because I need a feel for its uniqueness and other characteristics.)
Add INDEX(id, killmails_time).
What version are you using?
Perhaps UNION ALL give the same results? It would be faster because it would not need to de-dup.
How much RAM do you have? What is the value of innodb_buffer_pool_size?
Do you really need 8-byte BIGINTs? Even if your application is using longlong (or whatever it calls it), you can probably change the schema without changing the app.
Do you need this much precision and range? DECIMAL(30,10) -- it takes 14 bytes each. DOUBLE would give you about 16 significant digits in 8 bytes, with a wider range of values (up to about 10^308). What "units" are you using? (Overkill for light-years or parsecs; inadequate for miles or km. Perhaps AUs? Then the bottom digit would be a precision of a few meters?)
The last few questions are aimed at shrinking the table and seeing if we can avoid it being as I/O-bound as it apparently is now.
Important
innodb_buffer_pool_size = 128M is terribly small, especially for a 32GB machine, and especially if your dataset is much bigger than 128MB. If there are not any other apps running on the server, bump that setting up to 20G.

MySQL RDS performance of aggregation functions

We have a query that performs some aggregation on one column.
The filtering of the data seems to be pretty fast, but the aggregation seems to take too much time.
This query returns ~ 1.5 million rows. It runs for 0.6 seconds (if we want to return the data to the client it takes ~ 2 minutes - the way we tested this is by using the pymysql python library. We used an unbuffered cursor, so we can distinguish between query run time and fetch time):
SELECT *
FROM t_data t1
WHERE (t1.to_date = '2019-03-20')
AND (t1.period = 30)
AND (label IN ('aa','bb') )
AND ( id IN (
SELECT id
FROM t_location_data
WHERE (to_date = '2019-03-20') AND (period = 30)
AND ( country = 'Narniya' ) ) )
But if we run this query:
SELECT MAX(val) val_max,
AVG(val) val_avg,
MIN(val) val_min
FROM t_data t1
WHERE (t1.to_date = '2019-03-20')
AND (t1.period = 30)
AND (label IN ('aa','bb') )
AND ( id IN (
SELECT id
FROM t_location_data
WHERE (to_date = '2019-03-20') AND (period = 30)
AND ( country = 'Narniya' ) ) )
We see that the time to run the query takes 40 seconds and the time to fetch the results in this case is obviously less than a second..
Any help with this terrible performance of the aggregation functions over RDS Aurora? Why calculating Max Min and Avergae on 1.5 million lines takes so long (When comparing to Python on those same numbers, the calculation takes less than 1 second..)
NOTE: We added random number to each select to make sure we do not get cached values.
We use Aurora RDS:
1 instance of db.r5.large (2 vCPU + 16 GB RAM)
MySQL Engine version: 5.6.10a
Create table:
Create Table: CREATE TABLE `t_data` (
`id` varchar(256) DEFAULT NULL,
`val2` int(11) DEFAULT NULL,
`val3` int(11) DEFAULT NULL,
`val` int(11) DEFAULT NULL,
`val4` int(11) DEFAULT NULL,
`tags` varchar(256) DEFAULT NULL,
`val7` int(11) DEFAULT NULL,
`label` varchar(32) DEFAULT NULL,
`val5` varchar(64) DEFAULT NULL,
`val6` int(11) DEFAULT NULL,
`period` int(11) DEFAULT NULL,
`to_date` varchar(64) DEFAULT NULL,
`data_line_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`data_line_id`),
UNIQUE KEY `id_data` (`to_date`,`period`,`id`),
KEY `index1` (`to_date`,`period`,`id`),
KEY `index3` (`to_date`,`period`,`label`)
) ENGINE=InnoDB AUTO_INCREMENT=218620560 DEFAULT CHARSET=latin1
Create Table: CREATE TABLE `t_location_data` (
`id` varchar(256) DEFAULT NULL,
`country` varchar(256) DEFAULT NULL,
`state` varchar(256) DEFAULT NULL,
`city` varchar(256) DEFAULT NULL,
`latitude` float DEFAULT NULL,
`longitude` float DEFAULT NULL,
`val8` int(11) DEFAULT NULL,
`val9` tinyint(1) DEFAULT NULL,
`period` int(11) DEFAULT NULL,
`to_date` varchar(64) DEFAULT NULL,
`location_line_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`location_line_id`),
UNIQUE KEY `id_location_data` (`to_date`,`period`,`id`,`latitude`,`longitude`),
KEY `index1` (`to_date`,`period`,`id`,`country`),
KEY `index2` (`country`,`state`,`city`),
KEY `index3` (`to_date`,`period`,`country`,`state`)
) ENGINE=InnoDB AUTO_INCREMENT=315944737 DEFAULT CHARSET=latin1
Parameters:
##innodb_buffer_pool_size/1024/1024/1024: 7.7900
##innodb_buffer_pool_instances: 8
UPDATE:
Adding the val index (like suggest by #rick-james) did improve the query dramatically (took ~2 seconds) only if I delete the AND ( id IN (SELECT id FROM t_location_data.. condition. If I leave it, the query runs for about ~25 seconds.. better than before but still not good..
Indexes needed:
t_data: INDEX(period, to_date, label, val)
t_data: INDEX(period, label, to_date, val)
t_location_data: INDEX(period, country, to_date, id)
Also, change from the slow IN ( SELECT ... ) to a JOIN:
FROM t_data AS d
JOIN t_location_data AS ld USING(id)
WHERE ...
Better yet, since the tables are 1:1 (is that correct?), combine the tables so as to eliminate the JOIN. If id is not the PRIMARY KEY in each table, you really need to provide SHOW CREATE TABLE and should change the name(s).

Slow search query with a one to many join

My problem is a slow search query with a one-to-many relationship between the tables. My tables look like this.
Table Assignment
CREATE TABLE `Assignment` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ProjectId` int(10) unsigned NOT NULL,
`AssignmentTypeId` smallint(5) unsigned NOT NULL,
`AssignmentNumber` varchar(30) NOT NULL,
`AssignmentNumberExternal` varchar(50) DEFAULT NULL,
`DateStart` datetime DEFAULT NULL,
`DateEnd` datetime DEFAULT NULL,
`DateDeadline` datetime DEFAULT NULL,
`DateCreated` datetime DEFAULT NULL,
`Deleted` datetime DEFAULT NULL,
`Lat` double DEFAULT NULL,
`Lon` double DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `idx_assignment_assignment_type_id` (`AssignmentTypeId`),
KEY `idx_assignment_assignment_number` (`AssignmentNumber`),
KEY `idx_assignment_assignment_number_external`
(`AssignmentNumberExternal`)
) ENGINE=InnoDB AUTO_INCREMENT=5280 DEFAULT CHARSET=utf8;
Table ExtraFields
CREATE TABLE `ExtraFields` (
`assignment_id` int(10) unsigned NOT NULL,
`name` varchar(30) NOT NULL,
`value` text,
PRIMARY KEY (`assignment_id`,`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
My search query
SELECT
`Assignment`.`Id`, COL_5_72, COL_5_73, COL_5_74, COL_5_75, COL_5_76,
COL_5_77 FROM (
SELECT
`Assignment`.`Id`,
`Assignment`.`AssignmentNumber` AS COL_5_72,
`Assignment`.`AssignmentNumberExternal` AS COL_5_73 ,
`AssignmentType`.`Name` AS COL_5_74,
`Assignment`.`DateStart` AS COL_5_75,
`Assignment`.`DateEnd` AS COL_5_76,
`Assignment`.`DateDeadline` AS COL_5_77 FROM `Assignment`
CASE WHEN `ExtraField`.`Name` = "WorkDistrict" THEN
`ExtraField`.`Value` end as COL_5_78 FROM `Assignment`
LEFT JOIN `ExtraFields` as `ExtraField` on
`ExtraField`.`assignment_id` = `Assignment`.`Id`
WHERE `Assignment`.`Deleted` IS NULL -- Assignment should not be removed.
AND (1=1) -- Add assignment filters.
) AS q1
GROUP BY `Assignment`.`Id`
HAVING 1 = 1
AND COL_5_78 LIKE '%Amsterdam East%'
ORDER BY COL_5_72 ASC, COL_5_73 ASC;
When the table is only around 3500 records my query takes a couple of seconds to execute and return the results.
What is a better way to search in the related data? Should I just add a JSON field to the Assignment table and use the MySQL 5.7 Json query features? Or did I made a mistake in designing my database?
You are using select from subquery that forces MySQL to create unindexed temp table for each execution. Remove subquery (you really don't need it here) and it will be much faster.

How to optimize this mysql join on large table?

I have a project where the admin needs to create multiple newsletters with some crawled posts from the web.
I insert the posts in posts table after crawling has completed and assign them a feed_id to identify the source. this is the structure of posts table (truncated):
CREATE TABLE `posts` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`feed_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`identifier` varchar(255) DEFAULT NULL,
`published` timestamp NULL DEFAULT NULL,
`content` longtext,
...
...
`is_unread` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Every admin (user) has access to one or more "feeds". So in Newsletter creation page I want to show them a list of posts from the feeds they are allowed to see and also, I show a button to put the posts in specifict categories of that newsletter, if the user previously selected that post, I should show him that and let him remove it from the category. So I have some other tables too: newsletters, categories, newsletter_post, category_post. Here is their structures:
newsletters:
CREATE TABLE `newsletters` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`sent_at` timestamp NULL DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`date` date DEFAULT NULL,
`topic_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
categories:
CREATE TABLE `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`topic_id` int(11) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`slug` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
newsletter_post:
CREATE TABLE `newsletter_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`newsletter_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
category_post:
CREATE TABLE `category_post` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT NULL,
`category_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So I'm using this query to find posts for the allowed feeds and check the status if a post is in a specific category of this specific newsletter:
SELECT DISTINCT `posts`.`id`, `published`, `posts`.`title`, `posts`.`content`, `source_name`, `category_id`, `newsletter_id`, `link_href`, categories.title as category_title
FROM `posts`
LEFT JOIN `category_post` ON `posts`.`id` = `category_post`.`post_id`
LEFT JOIN `categories` ON `categories`.`id` = `category_post`.`category_id`
LEFT JOIN `newsletter_post` ON `posts`.`id` = `newsletter_post`.`post_id`
LEFT JOIN `newsletters` ON `newsletters`.`id` = `newsletter_post`.`newsletter_id`
WHERE `feed_id` IN (6, 7) ORDER BY `posts`.`published` DESC LIMIT 40 OFFSET 0
but the problem is this is horrible and not optimized. My posts table contains up to 50,000 rows each month, and each row with 3~10kbs of data in avg., so sometimes when I try to run the query (which is frequently run by the admin to make the newsletter, pagination etc) mysql shows this error: too much rows to join, etc. and most of the times its really slow.
and the reason I'm doing all this in one query is because I want the result to be in one json response so I can show them the user quickly without doing additional requests.
I wanna know if there is a better way to do this query or use indexes or something else.
Thanks you in advance for your help.
index your posts table on
( feed_id, published )
so the data is already optimized for your WHERE clause, and pre-sorted to help your ORDER BY.
For reading querys that have a lot of demand, InnoDB is very inefficient. I recommend you to use a NoSQL Database but if you don't want or the cost of change is too much... you can try this:
1) LIKE Sallar Kaboli told you, you have to index your tables in columns that use in JOIN querys. For example:
CREATE INDEX index1 ON newsletter_post (post_id);
2) USE only important columns for JOINS.
I mean, you have to only use the columns that use in SELECT part of query.
I hope this'd be helpful.
To complete other answers, I suggest to change this types on posts table:
1) Change feed_id to int(4). Really you have more than int(4) feeds?
2) Change is_unread to bit instead of int(1). I should say that this may not improve your given query in the question but according to the field name, the correct type is bit.
Another more improvement to this answer is that never use default int(11) for numeric or id fields, assign types more specific. Using smaller size of types will improve your indexes also. I don't think you need more than int(4) for fields id.
For example indexing and querying int(3) column is more faster than int(11).
Please create the following indexes indexes on ::
1) `post_id` in `category_post`
2) `post_id` in `newsletter_post`