SQL database index design for inner join keyword search - mysql

I have this query
SELECT a.*
FROM entries a
INNER JOIN entries_keywords b ON a.id = b.entry_id
INNER JOIN keywords c ON b.keyword_id = c.id
WHERE c.key IN ('wake', 'up')
GROUP BY a.id
HAVING COUNT(*) = 2
but it's slow. How do I design indexes optimally to speed things up?
EDIT
This is the current schema
CREATE TABLE `entries` (`id` integer PRIMARY KEY AUTOINCREMENT, `sha` text);
CREATE TABLE `entries_keywords` (`id` integer PRIMARY KEY AUTOINCREMENT, `entry_id` integer REFERENCES `entries`, `keyword_id` integer REFERENCES `keywords`);
CREATE TABLE `keywords` (`id` integer PRIMARY KEY AUTOINCREMENT, `key` string);
CREATE INDEX `entries_keywords_entry_id_index` ON `entries_keywords` (`entry_id`);
CREATE INDEX `entries_keywords_entry_id_keyword_id_index` ON `entries_keywords` (`entry_id`, `keyword_id`);
CREATE INDEX `entries_keywords_keyword_id_index` ON `entries_keywords` (`keyword_id`);
CREATE INDEX `keywords_key_index` ON `keywords` (`key`);
I'm using Sqlite3, the query doesn't fail, but is slow.
Right now I'm a query like this (subquery for each keyword):
select *
from (
select *
from (entries) e
inner join entries_keywords ek on e.id = ek.entry_id
inner join keywords k on ek.keyword_id = k.id
where k.key = 'wake') e
inner join entries_keywords ek on e.id = ek.entry_id
inner join keywords k on ek.keyword_id = k.id
where k.key = 'up';
This is way faster but doesn't feel right since it's going to get ugly if I have a lot of keywords.

The key indexes required for that query
keywords(key)
entries_keywords(keyword_id,entry_id)
entries(id)
You must be using MySQL, because the SELECT a.* would otherwise fail.
EDIT after the 2nd comment about this statement, let me point out why select a.* will fail here - it's because of the GROUP BY.
To explain, because the criteria (WHERE) is on c.key, it needs to be indexed.
This then goes up the JOIN against b.keyword_id. We create an index to include b.entry_id so that it never has to look up against the table - the index alone can cover the columns required.
Finally, a.id=b.entry_id joins back to the entries table, so we index the id of that table.
It is quite likely entries(id) is already the primary key, but you may have entries_keywords indexed the other way around - it won't work to satisfy this join.

Related

JOIN query taking long time and creating issue "converting HEAP to MyISAM

My query like below. here I used join query to take data. can u pls suggest how can I solve "converting HEAP to MyISAM" issue.
Can I use subquery here to update it? pls suggest how can I.
Here I have joined users table to check user is exist or not. can I refine it without join so that "converting HEAP to MyISAM" can solve.
Oh one more sometimes I will not check with specific user_id. like here I have added user_id = 16082
SELECT `user_point_logs`.`id`,
`user_point_logs`.`user_id`,
`user_point_logs`.`point_get_id`,
`user_point_logs`.`point`,
`user_point_logs`.`expire_date`,
`user_point_logs`.`point` AS `sum_point`,
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0, sum(`user_point_used_logs`.`point`)) AS `minus`
FROM `user_point_logs`
JOIN `users` ON ( `users`.`id` = `user_point_logs`.`user_id` )
LEFT JOIN (SELECT *
FROM user_point_used_logs
WHERE user_point_log_id NOT IN (
SELECT DISTINCT return_id
FROM user_point_logs
WHERE return_id IS NOT NULL
AND user_id = 16082
)
)
AS user_point_used_logs
ON ( `user_point_logs`.`id` = `user_point_used_logs`.`user_point_log_used_id` )
WHERE expire_date >= 1563980400
AND `user_point_logs`.`point` >= 0
AND `users`.`id` IS NOT NULL
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
AND `user_point_logs`.`user_id` = '16082'
GROUP BY `user_point_logs`.`id`
ORDER BY `user_point_logs`.`expire_date` ASC
DB FIDDLE HERE WITH STRUCTURE
Kindly try this, If it works... will optimize further by adding composite index.
SELECT
upl.id,
upl.user_id,
upl.point_get_id,
upl.point,
upl.expire_date,
upl.point AS sum_point,
coalesce(SUM(upl.point),0) AS minus -- changed from complex to readable
FROM user_point_logs upl
JOIN users u ON upl.user_id = u.id
LEFT JOIN (select supul.user_point_log_used_id from user_point_used_logs supul
left join user_point_logs supl on supul.user_point_log_id=supl.return_id and supl.return_id is null and supl.user_id = 16082) AS upul
ON upl.id=upul.user_point_log_used_id
WHERE
upl.user_id = 16082 and coalesce(upl.return_id,0)= 0
and upl.expire_date >= 1563980400 -- tip: if its unix timestamp change the datatype and if possible use range between dates
#AND upl.point >= 0 -- since its NN by default removing this condition
#AND u.id IS NOT NULL -- removed since the inner join matches not null
GROUP BY upl.id
ORDER BY upl.expire_date ASC;
Edit:
Try adding index in the column return_id on the table user_point_logs.
Since this column is used in join on derived query.
Or use composite index with user_id and return_id
Indexes:
user_point_logs: (user_id, expire_date)
user_point_logs: (user_id, return_id)
OR is hard to optimize. Decide on only one way to say whatever is being said here, then get rid of the OR:
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
DISTINCT is redundant:
NOT IN ( SELECT DISTINCT ... )
Change
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0,
sum(`user_point_used_logs`.`point`)) AS `minus`
to
COALESCE( ( SELECT SUM(point) FROM user_point_used_logs ... ), 0) AS minus
and toss LEFT JOIN (SELECT * FROM user_point_used_logs ... )
Since a PRIMARY KEY is a key, the second of these is redundant and can be DROPped:
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`) USING BTREE;
After all that, we may need another pass to further simplify and optimize it.

MySQL, Using indexes with subquery

Still trying to learn my way around indexes, shouldn't the JOINs in the outer query be using the index on the Primary Key? Are indexes not working in combination with subqueries? Thanks!
SELECT SQL_BIG_RESULT
I.item_group_id
FROM
(
SELECT SQL_BIG_RESULT
MAX(ITM.id) as max_id
FROM a_movements M
JOIN a_items_to_movements ITM ON ITM.movement_id = M.id -- Index used
WHERE M.warehouse_id IN (...) -- Index used
GROUP BY ITM.item_id
ORDER BY NULL
) X
JOIN a_items_to_movements ITM ON ITM.id = X.max_id -- Index not used
JOIN a_movements M ON M.id = ITM.movement_id
AND M.direction = 0
AND M.settled IS NOT NULL
JOIN a_items I ON I.id = ITM.item_id -- Index not used
GROUP BY I.item_group_id
ORDER BY NULL
EDIT: attached EXPLAIN output here: https://imgur.com/PdO3mIo

MYSQL order large database by decimal 10,10

I have about 25 million rows containing 0.183463545, 0.183423545, 0.183443545, 0.183443445, 0.183447545 and so on.
I need to order these however it currently takes around 20 seconds. Any way to speed it up? AFAIK, I have my index put it place correctly.
Thank you!
SELECT `a`.`float_val`,
`a`.`num_id`,
`b`.`userID`,
`c`.`img`,
`c`.`username`,
`b`.`img`,
`d`.`exterior`
FROM `a`
INNER JOIN `b` ON `b`.`num_id` = `a`.`num_id`
INNER JOIN `d` ON `d`.`id` = `b`.`item`
INNER JOIN `c` ON `c`.`userID` = `b`.`userID`
WHERE `float_val` IS NOT NULL
AND `float_val` BETWEEN 0 AND 1
AND `username` = 'ABC'
ORDER BY `float_val` LIMIT 100
Index is on float_val, num_id, userID
CREATE TABLE `float` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`num_id` bigint(11) DEFAULT NULL,
`float_val` decimal(10,10) DEFAULT NULL,
`userID` char(17) DEFAULT NULL,
`last_checked` datetime DEFAULT NULL,
`index10` smallint(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `floatID` (`num_id`),
UNIQUE KEY `num_id` (`num_id`,`userID`),
KEY `userID` (`userID`),
KEY `float_val` (`float_val`),
KEY `last_checked` (`last_checked`),
KEY `index10` (`index10`)
) ENGINE=InnoDB AUTO_INCREMENT=25750916 DEFAULT CHARSET=latin1;
Edited to reflect table definitions shown.
Start by doing a query to obtain the lowest float_val rows from table a.
SELECT a.id
FROM a
INNER JOIN b ON b.num_id = a.num_id
INNER JOIN d ON d.id = b.item
INNER JOIN c ON c.userID = b.userID
WHERE c.username = 'ABC'
AND a.float_val BETWEEN 0.0 AND 1.0
ORDER BY a.float_val
LIMIT 100
If you have an index on a(float_val,num_id), and another on c.username this will be reasonably fast. It will spit out the id values for the rows of a that are candidates for your query. (If you're using MyISAM, you need an index on a(float_val, num_id, id). By the way, the BETWEEN clause also implies IS NOT NULL.
Then use that as a subquery to complete your query, as follows.
SELECT a.float_val,
a.num_id,
b.userID,
c.img,
c.username,
b.img,
d.exterior
FROM a
INNER JOIN (
SELECT a.id
FROM a
INNER JOIN b ON b.num_id = a.num_id
INNER JOIN d ON d.id = b.item
INNER JOIN c ON c.userID = b.userID
WHERE c.username = 'ABC'
AND a.float_val BETWEEN 0.0 AND 1.0
ORDER BY a.float_val
LIMIT 100
) q ON a.ID = q.id
INNER JOIN b ON b.num_id = a.num_id
INNER JOIN d ON d.id = b.item
INNER JOIN c ON c.userID = b.userID
WHERE c.username = 'ABC'
AND a.float_val BETWEEN 0.0 AND 1.0
ORDER BY a.float_val LIMIT 100
This kind of query contains a deferred join. That dramatically reduces the number of rows of the full query that need to be subjected to ORDER BY ... LIMIT. Without the deferred join, your original query sorts an enormous mess of quite long rows just to discard all but the first hundred of them. That's why it takes so long.
This should help. The next optimization step is to look at the EXPLAIN output from this query and the exact definitions of your tables.
Pro tip: In queries of this complexity, always qualify column names with table names or aliases. That is, use a.float_val throughout, rather than just float_val. This is a kindness to the next person to look at the query.

How to speed up left join queries by indexing?

At the moment I am experiencing some slower MySQL queries in my application which I want to speed up. Unfortunately I’m not quite sure which is the correct way to do it.
I have the following (fictitious) tables: Book, Page and Word.
Word is child of Page by word_page_id
Page is child of Book by page_book id
I already have individual indexes on page_book_id, word_page_id, book_user_id and book_flag_delete.
SELECT `book`.*, COUNT(word_id) AS `word_amount` FROM `book`
LEFT JOIN `page` ON page_book_id = book_id
LEFT JOIN `word` ON word_page_id = paragraph_id
WHERE (book_user_id = 1) AND (book_flag_delete IS NULL)
GROUP BY `book_id`
ORDER BY `book_id` ASC LIMIT 100
SELECT COUNT(DISTINCT `book_id`) AS `book_row_count` FROM `book`
LEFT JOIN `page` ON page_book_id = book_id
LEFT JOIN `word` ON word_page_id = page_id
WHERE (book_user_id = 59) AND (book_flag_delete IS NULL)
Any ideas how to speed up such queries?
Is there extra indexing involved?
Set indexes on the fields you use for joining.
Further make sure that these have both the same datatype, encoding, and collation, else the index will also not be used.
mysql> EXPLAIN <query> will show you the actually used fields (key column in output) and the available indexes (possible_keys output field).
For this query:
SELECT b.*, COUNT(w.word_id) AS `word_amount`
FROM `book` b LEFT JOIN
`page` p
ON p.page_book_id = b.book_id LEFT JOIN
`word` w
ON w.word_page_id = p.paragraph_id
WHERE (b.book_user_id = 1) AND (b.book_flag_delete IS NULL)
GROUP BY b.`book_id`
ORDER BY b.`book_id` ASC
LIMIT 100;
The best indexes are: book(user_id, book_flag_delete, book_id), page(page_book_id, paragraph_id), and word(word_page_id, word_id).
However, the overall group by might be expensive. You might try writing the query as:
SELECT b.*,
(SELECT COUNT(w.word_id)
FROM `page` p JOIN
`word` w
ON w.word_page_id = p.paragraph_id
WHERE p.page_book_id = b.book_id
) AS `word_amount`
FROM `book` b LEFT JOIN
WHERE (b.book_user_id = 1) AND (b.book_flag_delete IS NULL)
ORDER BY b.`book_id` ASC
LIMIT 100;
The same indexes indexes work here. But, this query should avoid a group by on all the data at once (instead, it uses the indexes for the aggregation).
The optimal schema for a many-to-many mapping table is
CREATE TABLE XtoY (
# No surrogate id for this table
x_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to one table
y_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to the other table
# Include other fields specific to the 'relation'
PRIMARY KEY(x_id, y_id), -- When starting with X
INDEX (y_id, x_id) -- When starting with Y
) ENGINE=InnoDB;
The details on 'why' are in my index cookbook
In your select you're gonna want to refrain from using the wildcard "*" to grab columns. Plus utilize aliases ALWAYS!! This will keep your db from having to create a "virtual" alias.
select book1.column1, book1.column2, page1.column1
from book book1
left join page page1
on page1.page_book_id = book1.book_id
..... blah

MySQL optimizator join calculation rule

I read about MySQL optimization join and there found about MySQL decide who join will be first and i want to know exactly what is criterion about MySQL do this.
I was following MySQL example and creeate tables:
create table a(
col int default null,
index a_index(col)
);
create table b(
col int default null,
index a_index(col)
);
create table c(
col int default null,
index a_index(col)
);
create table d(
col int default null,
index a_index(col)
);
And follow her example query:
explain SELECT *
FROM a JOIN b LEFT JOIN c ON (c.col=a.col)
LEFT JOIN d ON (d.col=a.col)
WHERE b.col=d.col;
but i see order who he do join is same like i do
explain SELECT *
FROM b JOIN a LEFT JOIN c ON (c.col=a.col)
LEFT JOIN d ON (d.col=a.col)
WHERE b.col=d.col;
same index use , same extra field see, only order is b,a,d,c.
And i want to know why do d,c and don't do c,d.
Inner joins are commutative
It means that the order of evaluation doesn't matter, A JOIN B gives the same result as B JOIN A, and the optimizer can swap the order of join evaluation if it thinks that this gives better performance.
Outer joins are not commutative.
A LEFT JOIN B is not the same as B LEFT JOIN A, their results can differ.
In this case the optimizer cannot change the order of evaluation, it must alwayst get a result of do A first, then can evaluate B.