I'm trying to understand the proper way to assign indexes on a lookup table. Given the following tables and sample query, what are the most efficient primary/additional indexes for the lookup table?
Table: items (id, title, etc.)
Table: categories (id, title, etc.)
Table: lookup (category_id, item_id, type, etc.)
SELECT * FROM items
INNER JOIN lookup ON
lookup.item_id=items.id AND lookup.type="items"
INNER JOIN categories ON
categories.id=lookup.category_id;
For this query:
SELECT *
FROM items i JOIN
lookup l
ON l.item_id = i.id AND l.type = 'items' JOIN
categories c
ON c.id = l.category_id;
The best indexes are probably:
lookup(type, item_id)
categories(id) (probably there already if id is a primary key)
items(id) (probably there already if id is a primary key)
Under some circumstances, this may not be a big improvement, particularly if most lookup() rows have a type of "items".
Apart from the join predicates your query only has a single filering precate (lookup.type = "items"). If this predicate has a good selectivity (i.e. it selects 5% or less of the rows) then you should use it as the first column of the index. I would do:
create index ix1 on lookup (type, item_id, category_id)
If the id columns on the table items and categories represent the primary keys, then there's nothing else to do.
The engine will probably read the lookup table using the index, and then will read the other two tables using their PK indexes.
Do not have an auto_incr id for the mapping table.
Have
PRIMARY KEY(type, item_id, category_id),
INDEX(category_id, type, item_id)
For the second index, will you need type when going from a category to an item? If not, leave it out.
More: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Related
I am running a query on three tables messages, message_recipients and users.
Table structure of messages table:
id int pk
message_id int
message text
user_id int
...
Index for this table is on user_id, message_id and id.
Table structure of message_recipients table:
id int pk
message_id int
read_date datetime
user_id int
...
Index is on id, message_id and user_id.
Table structure of users table:
id int pk
display_name varchar
...
Index is on id.
I am running the following query against these tables:
SELECT
m.*,
if(m.user_id = 0, 'Campus Manager', u.display_name) AS name,
mr.read_date,
IF(m1.message_id > 0 and m1.user_id=1, true, false) as replied
FROM
messages m
JOIN
message_recipients mr
ON
mr.message_id = m.id
LEFT JOIN
users u
ON
u.UID = m.user_id
LEFT JOIN
messages m1
ON
m1.message_id = m.id
WHERE
mr.user_id = 1
AND
m.published = 1
GROUP BY
mr.message_id
ORDER BY
m.created DESC
EXPLAIN returns the following data for this query:
UPDATE
As suggested by #e4c5, I added new composite index on (published,user_id,created) and now the explain query shows this:
How can this query be optimized by adding required indexes (if any) as it is taking lot of time?
GROUP BY needs to list all the non-aggregated columns. I suspect that would be a mess. Why do you need GROUP BY at all?
Why are you linking messages.id to messages_id? Is this a hierarchical table, but the column names aren't like 'parent_id'?
"Index is on id, message_id and user_id" -- is that one composite index or 3 single-column indexes? (It makes a big difference.) It would be better to show us SHOW CREATE TABLE instead of ambiguously paraphrasing.
Is user_id=1 prolific? That is, are you expecting thousands of rows? Is this query only a problem for him?
Using LEFT JOIN implies that m1.message_id could be NULL, yet the reference to it seems to ignore that possibility.
If this is a single table that contains a message thread -- both the main info about the thread and the individual responses, then I suggest it is a bad design. (I made this mistake once upon a time.) I think it iis better to have a table with one row per thread and another table with one row per comment. 1 thread : many comments. So there would be a thread_id in the comment table.
I was able to bring down the query time from 3 seconds to 0.1 second by adding a new index to messages and message_recipients table and changing the database engine of messages table to MyISAM from InnoDB.
Composite index composite added on these columns with respective order on messages table - published, user_id, created
Composite index message_id_2 added on two columns on message_recipients table - message_id, user_id
EXPLAIN Query now shows
Let say i have two tables,
for the sake of question, let's assume that they are two tables called customers and cars.
Customers
id name age
Cars
id customer_id brand_id engine cc
Do we need to index customer_id? Does it give any advantage?
like to highlight that on InnoDB, index automatically created on foreign key columns.
see innodb-foreign-key-constraints
in your case customer_id if the foreign key constraint is applied.
Yes it is, you probably want to join the customers table, you need to put a index on customer_id so the lookup can be done faster.
But like said in the comments, it depends, if you're not going to join the customers table (or do a WHERE / GROUP BY / ORDER BY etc. on it) and purely use it do display the id, it is not necassery.
Depending on your application business logic and how you will query the base, having an index on customer_id will give you a huge advantage on queries like
select * from customers join cars on customer_id = customers.id -- list all customers with their associated cars
Or even
select * from cars where customer_id = 2 -- list all cars for user 2
More generally, it is always a good idea to index foreign key constraints.
I have 4 tables: rooms(id, name, description), clients(id, name, email), cards(id, card_number, exp_date, client_id) and orders(id, client_id, room_id, card_id, start_date, end_date).
The tables are all InnoDB and are pretty much simple. What I need is to add relationships between them. What I did was to assign cards.client_id as a Foreign Key to db.clients and orders.client_id, orders.room_id and orders.card_id as Foreign Keys to the other tables.
My question: is this way correct and reliable? I never had the need to use Foreign Key before now and this is my first try. All the Foreign Keys are also indexes.
Also, what's the easiest way to retrieve all the information I need for db.orders ?
I need a query to output: who is the client, what's his card details, what room/s did he ordered and what's the period he's checked in.
Can I accomplish this query based on the structure I created?
You must create the FK's in all columns that relate to other tables. In your case, create on: cards.client_id, orders.client_id, orders.room_id, orders.card_id
In the case of MySQL it automatically creates indexes for these FK's.
On your select, I believe it can be the following:
SELECT * FROM orders
INNER JOIN client on client.id = orders.client_id
INNER JOIN cards on cards.client_id = client.id
INNER JOIN rooms on rooms.id = orders.room_id
I do not know what columns you need, there is only you replace the * by the columns you need, so SQL is faster.
Because it takes forever to make index changes on my 40 million row table, I was hoping to get some feedback to make sure I do it right the first time.
Right now my "favorites" table has 3 indexes:
Primary auto-increment index on (id)
item_idx (item_id) - the id of the item that was favorited
faver_id_idx (faver_profile_id, id) - for displaying favorites from a particular user starting with the most recent.
To check to see if the user has "faved" a particular item I use this query:
SELECT id FROM favorites
WHERE item_id = '.mysql_real_escape_string($item_id).'
AND faver_profile_id = '.mysql_real_escape_string($user['id']).'
AND removed = 0
Which is doing an interect:
Using intersect(item_idx,faver_id_idx)
This seems pretty inefficient to me, so I'm considering the following index setup:
Primary auto-increment index on (id)
item_faver_idx (item_id, removed, faver_profile_id)
faver_id_idx (faver_profile_id, removed, id)
The benefits I see are:
I can check if a user has faved an item without doing an intersect or table sort.
The "removed" (tinyint) column is now part of the index.
Questions I have:
In the (item_id, removed, faver_profile_id) index is there any reason to have faver_profile_id come first instead? For instance, if I'm doing the following query..
SELECT items.*, users.*, favorites.item_id
FROM items
LEFT JOIN users ON (items.submitter_id = users.id)
LEFT JOIN favorites ON (items.id = favorites.item_id AND favorites.faver_profile_id = 56 AND favorites.removed = 0)
ORDER BY items.id desc LIMIT 26
Would it be better to have faver_profile_id come first in the index so that it can just jump to the right faver_profile_id section of the index instead of having to check multiple item_id sections, and then scanning for the faver_profile_id within each of those sections?
Does it make sense to have "removed" in the index if only 1-3% of rows have a removed value of 1? Basically, is a slightly more efficient table scan worth the extra index size?
Anything I'm overlooking?
I have 2 tables that manages the time spent on doing various things:
#times(id, time_in_minutes)
#times_intervals(id, times_id, time_in_minutes, start, end)
Then the #times might relate to different things:
#tasks(id, description)
#products(id, description, serial_number, year)
What is the best practice in order to reuse the same #times and #times_intervals for #task and #products?
I would think about:
#times(+task_id, +product_id)
// add task_id and product_id to the original #times table
But if I do so, when I'd join the #times table with #task and #products table would be slower as should choice between the 2 (task_id or product_id). When task_id is not null join on the #tasks and viceversa.
(I'm using MySQL6)
Thanks a lot
I would drop the time_in_minutes column from the times table. This information is redundant if it is just the sum of the detail and is a premature optimization.
I would add a product_time table containing product_id, times_id and a task_time table containing task_id, time_id
Then to get the total time with a product:
SELECT *
FROM product p
INNER JOIN product_time pt
ON pt.product_id = p.id
INNER JOIN (
SELECT times_id, SUM(time_in_minutes) as time_in_minutes
FROM times_intervals
GROUP BY times_id
) AS t
ON t.times_id = pt.times_id
Typically to make this perform, you would have a non-clustered covering index for times_intervals with columns times_id and time_in_minutes - note that the times table is simply a data-less header table at this point and the only purpose it to group the times_intervals and it's only necessary because you have this very similar arrangement for tasks.
If there were not two (or more) entities using the times_intervals, you might simply put product_id in the times_intervals and treat it as your header/master id.
I would suggest against adding an id column to times for every table you might join it to. It would break normalization and make joins much more complicated.
If you only have one time (or time interval) for a task or a product, make a column in that table that references the times table. Otherwise, you could make a separate table like
#multitimes(multi_id, time_id)
where the two columns together are a primary key, and then have products and tasks reference multi_id. Then each record in each of those tables can be related to any number of times without any conflicts.