I have a query that is taking an embarrassingly long time. ~7 minutes embarrassing. I would really appreciate some help. Missing indexes? Rewrite the query? All of the above?
Many thanks
mysql Ver 14.14 Distrib 5.7.25, for Linux (x86_64)
The query looks like:
SELECT COUNT(*) AS count_all, name
FROM api_events ae
INNER JOIN products p on p.token=ae.product_token
WHERE (ae.created_at > '2019-01-21 12:16:53.853732')
GROUP BY name
Here are the two table definitions
api_events has ~31 million records
CREATE TABLE `api_events` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`api_name` varchar(200) NOT NULL,
`hostname` varchar(200) NOT NULL,
`controller_action` varchar(2000) NOT NULL,
`duration` decimal(12,5) NOT NULL DEFAULT '0.00000',
`view` decimal(12,5) NOT NULL DEFAULT '0.00000',
`db` decimal(12,5) NOT NULL DEFAULT '0.00000',
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`product_token` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `product_token` (`product_token`)
) ENGINE=InnoDB AUTO_INCREMENT=64851218 DEFAULT CHARSET=latin1;
and
products has only 12 records
CREATE TABLE `products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(30) NOT NULL,
`name` varchar(100) NOT NULL,
`description` varchar(2000) NOT NULL,
`token` varchar(50) NOT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=19 DEFAULT CHARSET=latin1;
You could improve the join performance adding index
create index idx1 on api_events(product_token, created_at);
create index idx2 on products(token);
You could also trying inverting the columns ofr api_events
create index idx1 on api_events(created_at, product_token);
and trying add redundancy to product index
create index idx2 on products(token, name);
For the query as stated, you needed
api_events: INDEX(created_at, product_token)
products: INDEX(token, name)
Because the WHERE mentions api_events, the Optimizer is likely to start with that table. created_at is in the WHERE, so the index starts with that, even though starting with a 'range' is usually wrong. In this case, the pair is "covering".
Then, INDEX(token, name) is also "covering".
"Covering" indexes give a small, but widely varying, amount of performance improvement.
What happens if you group by the token instead of the name?
SELECT ae.product_token, COUNT(*) AS count_all
FROM api_events ae
WHERE ae.created_at > '2019-01-21 12:16:53.853732')
GROUP BY ae.product_token;
For this query, an index on api_events(created_at, product_token) will probably help.
If this is faster, then you can bring in the name using a subquery.
It seems like the criteria on created_at is very selective (looking at only the past 7 days?). That's crying out to explore an index with created_at as a leading column.
The query is also referencing the product_token column from the same table, so we can include that column in the index, to make it a covering index.
api_events_IX ON api_events ( created_at, product_token )
Using that index, we can probably avoid looking at the vast majority of the 31 million rows, and quickly narrow in on the subset of rows we actually need to look at.
Using the index, the query will still need a "Using filesort" operation to satisfy the GROUP BY.
(My guess here is that the join to the 12 rows in product doesn't exclude a lot of rows... that on the vast majority of rows in api_event the product_token refers to a row that exists in product.
Use MySQL EXPLAIN to see the query execution plan.
A further possible refinement (to test the performance of) would be to do some of the aggregation in an inline view:
SELECT SUM(s.count_all) AS count_all
, p.name
FROM ( SELECT COUNT(*) AS count_all
, ae.product_token
FROM api_events ae
WHERE ae.created_at > '2019-01-21 12:16:53.853732'
GROUP
BY ae.product_token
) s
JOIN products p
ON p.token = s.product_token
GROUP
BY p.name
If the assumption about product_token is misinformed, if there are lots of rows in api_event that have product_token values that don't reference a row in product ... we might take a different tack ...
Related
I am optimising my queries and found something I can't get my head around.
I am using the following query to select a bunch of categories, combining them with an alias from a table containing old and new aliases for categories:
SELECT `c`.`id` AS `category.id`,
(SELECT `alias`
FROM `aliases`
WHERE category_id = c.id
AND `old` = 0
AND `lang_id` = 1
ORDER BY `id` DESC
LIMIT 1) AS `category.alias`
FROM (`categories` AS c)
WHERE `c`.`status` = 1 AND `c`.`parent_id` = '11';
There are only 2 categories with a value of 11 for parent_id, so it should look up 2 categories from the alias table.
Still if I use EXPLAIN it says it has to process 48 rows. The alias table contains 1 entry per category as well (in this case, it can be more). Everything is indexed and if I understand correctly therefore it should find the correct alias immediately.
Now here's the weird thing. When I don't compare the aliases by the categories from the conditions, but manually by the category ids the query returns, it does process only 1 row, as intended with the index.
So I replace WHERE category_id = c.id by WHERE category_id IN (37, 43) and the query gets faster:
The only thing I can think of is that the subquery isn't run over the results from the query but before some filtering is done. Any kind of explanation or help is welcome!
Edit: silly me, the WHERE IN doesn't work as it doesn't make a unique selection. The question still stands though!
Create table schema
CREATE TABLE `aliases` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lang_id` int(2) unsigned NOT NULL DEFAULT '1',
`alias` varchar(255) DEFAULT NULL,
`product_id` int(10) unsigned DEFAULT NULL,
`category_id` int(10) unsigned DEFAULT NULL,
`brand_id` int(10) unsigned DEFAULT NULL,
`page_id` int(10) unsigned DEFAULT NULL,
`campaign_id` int(10) unsigned DEFAULT NULL,
`old` tinyint(1) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `category_id` (`category_id`),
KEY `page_id` (`page_id`),
KEY `alias_product_id` (`product_id`,`alias`),
KEY `alias_category_id` (`category_id`,`alias`),
KEY `alias_page_id` (`page_id`,`alias`),
KEY `alias_brand_id` (`brand_id`,`alias`),
KEY `alias_product_id_old` (`alias`,`product_id`,`old`),
KEY `alias_category_id_old` (`alias`,`category_id`,`old`),
KEY `alias_brand_id_old` (`alias`,`brand_id`,`old`),
KEY `alias_page_id_old` (`alias`,`page_id`,`old`),
KEY `lang_brand_old` (`lang_id`,`brand_id`,`old`),
KEY `id_category_id_lang_id_old` (`lang_id`,`old`,`id`,`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=112392 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
SELECT ...
WHERE x=1 AND y=2
ORDER BY id DESC
LIMIT 1
will be performed in one of several ways.
Since you have not shown us the indexes you have (SHOW CREATE TABLE), I will cover some likely cases...
INDEX(x, y, id) -- This can find the last row for that condition, so it does not need to look at more than one row.
Some other index, or no index: Scan DESCending from the last id checking each row for x=1 AND y=2, stopping when (if) such a row is found.
Some other index, or no index: Scan the entire table, checking each row for x=1 AND y=2; collect them into a temp table; sort by id; deliver one row.
Some of the EXPLAIN clues:
Using where -- does not say much
Using filesort -- it did a sort, apparently for the ORDER BY. (It may have been entirely done in RAM; ignore 'file'.)
Using index condition (not "Using index") -- this indicates an internal optimization in which it can check the WHERE clause more efficiently than it used to in older versions.
Do not trust the "Rows" in EXPLAIN. Often they are reasonably correct, but sometimes they are off by orders of magnitude. Here is a better way to see "how much work" is being done in a rather fast query:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
With the CREATE TABLE, I may have suggestions on how to improve the index.
I would like to understand at what point in time will MySQL use the indexed column when using ORDER BY.
For example, the query
SELECT * FROM A
INNER JOIN B ON B.id = A.id
WHERE A.status = 1 AND A.name = 'Mike' AND A.created_on BETWEEN '2014-10-01 00:00:00' AND NOW()
ORDER BY A.accessed_on DESC
Based on my knowledge a good index for the above query is an index on table A (id, status, name created_on, accessed_on) and another on B.id.
I also understand that SQL execution follow the order below. but I am not sure how the order selection and order works.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
Question
Is will it be better to start the index with the id column or in this case is does not matter since WHERE is executed first before the JOIN? or should it be
Second question the column accessed_on should it be at the beginning of the index combination, end or the middle? or should the id column come after all the columns in the WHERE clause?
I appreciate a detailed answer so I can understand the execution level of MySQL/SQL
UPDATED
I added few million records to both tables A and B then I have added multiple indexes to see which would be the best index. But, MySQL seems to like the index id_2 (ie. (status, name, created_on, id, accessed_on))
It seems to be applying the where and it will figure out that it would need and index on status, name, created_on then it apples the INNER JOIN and it will use the id index followed by the first 3. Finally, it will look for accessed_on as the last column. so the index (status, name, created_on, id, accessed_on) fits the same execution order
Here is the tables structures
CREATE TABLE `a` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`status` int(2) NOT NULL,
`name` varchar(255) NOT NULL,
`created_on` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`accessed_on` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `status` (`status`,`name`),
KEY `status_2` (`status`,`name`,`created_on`),
KEY `status_3` (`status`,`name`,`created_on`,`accessed_on`),
KEY `status_4` (`status`,`name`,`accessed_on`),
KEY `id` (`id`,`status`,`name`,`created_on`,`accessed_on`),
KEY `id_2` (`status`,`name`,`created_on`,`id`,`accessed_on`)
) ENGINE=InnoDB AUTO_INCREMENT=3135750 DEFAULT CHARSET=utf8
CREATE TABLE `b` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3012644 DEFAULT CHARSET=utf8
The best indexes for this query is: A(status, name, created_on) and B(id). These indexes will satisfy the where clause and use the index for the join to B.
This index will not be used for sorting. There are two major impediments to using any index for sorting. The first is the join. The second is the non-equality on created_on. Some databases might figure out to use an index on A(status, name, accessed_on), but I don't think MySQL is smart enough for that.
You don't want id as the first column in the index. This precludes using the index to filter on A, because id is used for the join rather than in the where.
I am trying to generate a group query on a large table (more than 8 million rows). However I can reduce the need to group all the data by date. I have a view that captures that dates I require and this limits the query bu it's not much better.
Finally I need to join to another table to pick up a field.
I am showing the query, the create on the main table and the query explain below.
Main Query:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
INNER JOIN summary_max
ON pgi_raw_data.wsp_channel = summary_max.wsp_channel
AND pgi_raw_data.dated > summary_max.race_date
INNER JOIN pgi_accounts
ON pgi_raw_data.account = pgi_accounts.account
GROUP BY pgi_raw_data.event_id
ORDER BY NULL
The create table:
CREATE TABLE `pgi_raw_data` (
`event_id` char(25) NOT NULL DEFAULT '',
`wsp_channel` varchar(5) NOT NULL,
`dated` date NOT NULL,
`time` time DEFAULT NULL,
`program` varchar(5) NOT NULL,
`track` varchar(25) NOT NULL,
`raceno` tinyint(2) NOT NULL,
`detail` varchar(30) DEFAULT NULL,
`ticket` varchar(20) NOT NULL DEFAULT '',
`breed` varchar(12) NOT NULL,
`pool` varchar(10) NOT NULL,
`gross` decimal(11,2) NOT NULL,
`refunds` decimal(11,2) NOT NULL,
`handle` decimal(11,2) NOT NULL,
`payout` decimal(11,4) NOT NULL,
`rebate` decimal(11,4) NOT NULL,
`profit` decimal(11,4) NOT NULL,
`account` mediumint(10) NOT NULL,
PRIMARY KEY (`event_id`,`ticket`),
KEY `idx_account` (`account`),
KEY `idx_wspchannel` (`wsp_channel`,`dated`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
This is my view for summary_max:
CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost` SQL SECURITY DEFINER VIEW
`summary_max` AS select `pgi_summary_tbl`.`wsp_channel` AS
`wsp_channel`,max(`pgi_summary_tbl`.`race_date`) AS `race_date`
from `pgi_summary_tbl` group by `pgi_summary_tbl`.`wsp
And also the evaluated query:
1 PRIMARY <derived2> ALL 6 Using temporary
1 PRIMARY pgi_raw_data ref idx_account,idx_wspchannel idx_wspchannel
7 summary_max.wsp_channel 470690 Using where
1 PRIMARY pgi_accounts ref PRIMARY PRIMARY 3 gf3data_momutech.pgi_raw_data.account 29 Using index
2 DERIVED pgi_summary_tbl ALL 42282 Using temporary; Using filesort
Any help on indexing would help.
At a minimum you need indexes on these fields:
pgi_raw_data.wsp_channel,
pgi_raw_data.dated,
pgi_raw_data.account
pgi_raw_data.event_id,
summary_max.wsp_channel,
summary_max.race_date,
pgi_accounts.account
The general (not always) rule is anything you are sorting, grouping, filtering or joining on should have an index.
Also: pgi_summary_tbl.wsp
Also, why the order by null?
The first thing is to be sure that you have indexes on pgi_summary_table(wsp_channel, race_date) and pgi_accounts(account). For this query, you don't need indexes on these columns in the raw data.
MySQL has a tendency to use indexes even when they are not the most efficient path. I would start by looking at the performance of the "full" query, without the joins:
SELECT pgi_raw_data.wsp_channel,
'IOM' AS wsp,
pgi_raw_data.dated,
-- pgi_accounts.`master`,
pgi_raw_data.event_id,
pgi_raw_data.breed,
Sum(pgi_raw_data.handle),
Sum(pgi_raw_data.payout),
Sum(pgi_raw_data.rebate),
Sum(pgi_raw_data.profit)
FROM pgi_raw_data
GROUP BY pgi_raw_data.event_id
If this has better performance, you may have a situation where the indexes are working against you. The specific problem is called "thrashing". It occurs when a table is too bit to fit into memory. Often, the fastest way to deal with such a table is to just read the whole thing. Accessing the table through an index can result in an extra I/O operation for most of the rows.
If this works, then do the joins after the aggregate. Also, consider getting more memory, so the whole table will fit into memory.
Second, if you have to deal with this type of data, then partitioning the table by date may prove to be a very useful option. This will allow you to significantly reduce the overhead of reading the large table. You do have to be sure that the summary table can be read the same way.
I have a simple mysql query, but when I have a lot of records (currently 103,0000), the performance is really slow and it says it is using filesort, im not sure if this is why it is slow. Has anyone any suggestions to speed it up? or stop it using filesort?
MYSQL query :
SELECT *
FROM adverts
WHERE (price >= 0)
AND (status = 1)
AND (approved = 1)
ORDER BY date_updated DESC
LIMIT 19990, 10
The Explain results :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE adverts range price price 4 NULL 103854 Using where; Using filesort
Here is the adverts table and indexes:
CREATE TABLE `adverts` (
`advert_id` int(10) NOT NULL AUTO_INCREMENT,
`user_id` int(10) NOT NULL,
`type_id` tinyint(1) NOT NULL,
`breed_id` int(10) NOT NULL,
`advert_type` tinyint(1) NOT NULL,
`headline` varchar(50) NOT NULL,
`description` text NOT NULL,
`price` int(4) NOT NULL,
`postcode` varchar(7) NOT NULL,
`town` varchar(60) NOT NULL,
`county` varchar(60) NOT NULL,
`latitude` float NOT NULL,
`longitude` float NOT NULL,
`telephone1` varchar(15) NOT NULL,
`telephone2` varchar(15) NOT NULL,
`email` varchar(80) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`approved` tinyint(1) NOT NULL DEFAULT '0',
`date_created` datetime NOT NULL,
`date_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`expiry_date` datetime NOT NULL,
PRIMARY KEY (`advert_id`),
KEY `price` (`price`),
KEY `user` (`user_id`),
KEY `type_breed` (`type_id`,`breed_id`),
KEY `headline_keywords` (`headline`),
KEY `date_updated` (`date_updated`),
KEY `type_status_approved` (`advert_type`,`status`,`approved`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The problem is that MySQL only uses one index when executing the query. If you add a new index that uses the 3 fields in your WHERE clause, it will find the rows faster.
ALTER TABLE `adverts` ADD INDEX price_status_approved(`price`, `status`, `approved`);
According to the MySQL documentation ORDER BY Optimization:
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
The key used to fetch the rows is not the same as the one used in the ORDER BY.
This is what happens in your case.
As the output of EXPLAIN tells us, the optimizer uses the key price to find the rows. However, the ORDER BY is on the field date_updated which does not belong to the key price.
To find the rows faster AND sort the rows faster, you need to add an index that contains all the fields used in the WHERE and in the ORDER BY clauses:
ALTER TABLE `adverts` ADD INDEX status_approved_date_updated(`status`, `approved`, `date_updated`);
The field used for sorting must be in the last position in the index. It is useless to include price in the index, because the condition used in the query will return a range of values.
If EXPLAIN still shows that it is using filesort, you may try forcing MySQL to use an index you choose:
SELECT adverts.*
FROM adverts
FORCE INDEX(status_approved_date_updated)
WHERE price >= 0
AND adverts.status = 1
AND adverts.approved = 1
ORDER BY date_updated DESC
LIMIT 19990, 10
It is usually not necessary to force an index, because the MySQL optimizer most often does the correct choice. But sometimes it makes a bad choice, or not the best choice. You will need to run some tests to see if it improves performance or not.
Remove the ticks around the '0' - it currently may prevent using the index but I am not sure.
Nevertheless it is better style since price is int type and not a character column.
SELECT adverts .*
FROM adverts
WHERE (
price >= 0
)
AND (
adverts.status = 1
)
AND (
adverts.approved = 1
)
ORDER BY date_updated DESC
LIMIT 19990 , 10
MySQL does not make use of the key date_updated for the sorting but just uses the price key as it is used in the WHERE clause. You could try to to use index hints:
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Add something like
USE KEY FOR ORDER BY (date_updated)
I have two suggestions. First, remove the quotes around the zero in your where clause. That line should be:
price >= 0
Second, create this index:
CREATE INDEX `helper` ON `adverts`(`status`,`approved`,`price`,`date_created`);
This should allow MySQL to find the 10 rows specified by your LIMIT clause by using only the index. Filesort itself is not a bad thing... the number of rows that need to be processed is.
Your WHERE condition uses price, status, approved to select, and then date_updated is used to sort.
So you need a single index with those fields; I'd suggest indexing on approved, status, price and date_updated, in this order.
The general rule is placing WHERE equalities first, then ranges (more than, less or equal, between, etc), and sorting fields last. (Note that leaving one field out might make the index less usable, or even unusable, for this purpose).
CREATE INDEX advert_ndx ON adverts (approved, status, price, date_updated);
This way, access to the table data is only needed after LIMIT has worked its magic, and you will slow-retrieve only a small number of records.
I'd also remove any unneeded indexes, which would speed up INSERTs and UPDATEs.
I have a MySQL query as follows:
SELECT KeywordText, SUM(Frequency) AS Frequency FROM Keyword, Keyword_Polling_Frequency_Index
WHERE Keyword.KeywordText
IN ('deal', 'obama' and other keywords...)
AND RSSFeedNo IN (106, 107 and other RSS feeds)
AND PollingDateTime
BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00'
AND Keyword.KeywordNo = Keyword_Polling_Frequency_Index.KeywordNo
GROUP BY Keyword.KeywordText
ORDER BY Keyword.KeywordText ASC
The query is used by an hourly batch program which involves two tables and is meant to get the frequencies of a list of keywords from a list of RSS feeds for a given hour. The Keyword_Polling_Frequency_Index table has a composite primary key of KeywordNo, RSSFeedNo and PollingDateTime. The query joins this table to the Keyword table which contains the KeywordText. column keywordText has a MySQL MyISAM full text index.
In testing this was found to perform satisfactorily but has now started running very slowly and affects the interactive speed of pages of the application. When I check the MySQL logs, I find that MySQL is creating temporary tables.
So, my question is, given that this query has to handle dozens of keywords in dozens of RSS feeds to calculate the frequencies, can anyone suggest an optimisation?
I have thought of breaking the query up by keyword but am not convinced of the practicality of this.
Can anyone help?
I am using MySQL Community Edition 5.X and an EXTENDED EXPLAIN of a version of this query is shown above.
SQL for the tables is as follows:
CREATE TABLE `keyword` (
`KeywordNo` int(10) unsigned NOT NULL AUTO_INCREMENT,
`KeywordText` varchar(64) NOT NULL,
`UserOriginated` enum('TRUE','FALSE') NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
`StopWord` enum('TRUE','FALSE') NOT NULL,
`CreatedDate` date NOT NULL,
`CreatedTime` time NOT NULL,
PRIMARY KEY (`KeywordNo`),
FULLTEXT KEY `KEYWORDTEXT` (`KeywordText`)
) ENGINE=MyISAM AUTO_INCREMENT=44047 DEFAULT CHARSET=latin1$$
CREATE TABLE `keyword_polling_frequency_index` (
`KeywordNo` int(10) unsigned NOT NULL,
`RSSFeedNo` int(10) unsigned NOT NULL,
`PollingDateTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Frequency` int(10) NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
PRIMARY KEY (`KeywordNo`,`RSSFeedNo`,`PollingDateTime`),
KEY `FK_keyword_polling_frequency_index_1` (`UserNo`),
CONSTRAINT `FK_keyword_polling_frequency_index_1` FOREIGN KEY (`UserNo`) REFERENCES `user` (`UserNo`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$
As mentioned previously, add an index to the PollingDateTime field in the order mentioned as well. This is my suggestion:
SELECT
K.KeywordText,
SUM(F.Frequency) AS Frequency
FROM
Keyword K, Keyword_Polling_Frequency_Index F
WHERE
EXISTS
(
SELECT 1
FROM Keyword K1
WHERE
MATCH K1.KeywordText AGAINST ('deal obama "another keyword" yetanother' IN BOOLEAN MODE)
AND K1.KeywordNo = K.KeywordNo
)
AND K.KeywordNo = F.KeywordNo
AND F.PollingDateTime BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00'
AND F.RSSFeedNo IN (106, 107, 110)
GROUP BY K.KeywordText
ORDER BY K.KeywordText ASC
This will probably reduce the number of records for the comparison (SQL inside-out parsing) instead of directly matching two tables (N x N).
If you don't have any indexes you should create relevant indexes.
The minimum index is on keyword_polling_frequency_index.PollingDateTime