Should I use indexes for a many-to-many database table? - mysql

does it make sense to create indexes for a table called user_movies with the following columns:
user_id
movie_id
There will be much more reading than inserting or updating on this table but I'm not sure what to do. Also: Is it adequate to omit a primary key in this situation?

The correct definition for this table is as follows:
CREATE TABLE user_movies (
user_id INT NOT NULL,
movie_id INT NOT NULL,
PRIMARY KEY (user_id, movie_id),
FOREIGN KEY (user_id) REFERENCES users(user_id),
FOREIGN KEY (movie_id) REFERENCES movies(movie_id)
) ENGINE=InnoDb;
Notice "primary key" is a constraint, not a column. It's best practice to have a primary key constraint in every table. Do not confuse primary key constraint with an auto-generated pseudokey column.
In MySQL, declaring a foreign key or a primary key implicitly creates an index. Yes, these are beneficial.

I would index both columns separately and yes you can eliminate the primary key.

I have always heard that you should create a unique index on BOTH columns, first one way (user_id + movie_id) then the other way (movie_id + user_id). It DOES work slightly faster (not much, about 10-20%) in my application with some quick and dirty testing.
It also makes sure you can't have two rows that tie the same movie_id to the same user_id (which could be good, but perhaps not always).

If you are using such a "join-table", you'll probably use some joins in your queries -- and those will probably benefit from an index on each one of those two columns (which means two separate indexes).

Related

Composite keys and unique constrains performances and alternatives

I'm creating a database using MySQL for a music streaming application for my school project. It has a table "song_discoveries" which has these columns: user_id, song_id and discovery_date. It has no primary key. The "user_id" and "song_id" are foreign keys and the "discovery_date" is self explanatory. My problem is that I want to ensure that there are no duplicate rows in this table since obviously a user can discover a song once, but I'm not sure on whether to use a unique constraint for all of the columns or create a composite primary key of all columns. My main concerns are what is the best practice for this and which has better performance? Are there any alternatives to these approaches?
In MySQL, a table is stored as a clustered index sorted by the primary key.
If the table has no primary key, but does have a unique key constraint on non-NULL columns, then the unique key becomes the clustered index, and acts almost exactly the same as a primary key. There's no performance difference between these two cases.
The only difference between a primary and a unique key on non-NULL columns is that you can specify the name of the unique key, but the primary key is always called PRIMARY.
If the goal is to create "no duplicate rows in this table". Then to do this, you need to identify what makes a "unique" record. If uniqueness is guaranteed by
the composite user_id, discovery_date and song_id that that should be your primary composite key.
Thinking a bit more, if we apply a rule that says, "a song can only be discovered once !" then your composite primary key should be user_id,song_id (this will guarantee that you don't add the same song multiple times), but
if you can discover the same song on multiple days, then you can leave the key as the composition of the 3 fields.
If you go with user/song then a table can look like this:
CREATE TABLE song_discoveries (
user_id int NOT NULL,
song_id int NOT NULL,
discovery_date DATE NOT NULL,
PRIMARY KEY (user_id, song_id)
);

I want to create multiple tables with the same composite primary key without data redundancy in mysql. How can I achieve this?

I am using mysql to create a database.
I have one base table named GP, with its Primary Key a composite Primary Key(SAT_ID, DATE).
I want to create multiple tables with the same Primary Key (SAT_ID,DATE), but would like to avoid data redundancy.
Is there a way to create a Primary Key for the children tables (for example ID INT NOT NULL AUTO_INCREMENT) that references the composite Primary Key (SAT_ID,DATE), so that I can avoid having the same composite Primary Key (SAT_ID,DATE) in every other table ?
I know the question can seem silly but there is something I don't understand about composite keys and data redundancy.
Thanks
#pepper's solution (suggested in the comments) works just fine:
You could modify your GP table to have an autoincrement ID as PK and
an unique index on (SAT_ID, DATE), then you can use ID as foreign
key in your other tables

When writing mySQL code, what's a better practice?

I'm learning mySQL, still on basic stuff.
My teacher has said that when writing, the best codes have first, all the tables; then, ALTER TABLE queries inserting keys to the tables. That way, we can properly name the keys. I know for sure he does that to foreign keys. He has taught this with primary keys examples as well; however, when providing files with answers for exercises he proposed, he typed the primary keys inside the tables, and later only altered the foreign keys.
How should I do it then? Always insert primary keys inside the tables, alter the foreign keys later? Or should I alter both primary and foreign keys? I'm currently trying to do he latter, and bumping into auto_increment issues for the primary keys.
Thank you for your insight!
You can't rename a primary key, so it makes no sense to do it later in an ALTER statement.
You're running into issues with auto_increment, because an auto_increment column also has to be (part of) the primary key. So you can not specify an auto_increment column but not make it primary key at the same time.
The thing is, this question is actually obsolete, as you can name your foreign keys also when creating the table. Which is for me the way that is prefered. Everything done in one statement. It would look like this:
CREATE TABLE foo (
id int auto_increment primary key,
bar int,
constraint my_fancy_fk_name foreign key (bar) references other_table(whatever_column)
);

Two primary non composite keys in a table

I am working on a project and I realized I am unsure about how to use multiple primary keys. I have a table named "User_Details" that has the details of Customer ID, email address and password. From my understanding, I can use both Customer ID and email address as the primary key. In this case do I use only one as Primary Key or both? If I use both, do they become composite primary keys?
(PS. I have other tables, where the foreign key is the customer ID)
You can only have one primary key, but you could definitely have other unique fields.
Usually using an integer / id as primary key is preferred over a string key, and an id is presumably auto assigned, where as email could change - which would be a problem for foreign key relations.
Since you already use customer Id as a foreign key in other tables, I would suggest you continue to do that.
You can only have one primary key, but you can have multiple columns in your primary key, alternatively you can also have Unique Indexes on your table, which will work a bit like a primary key in that they will enforce unique values, and will speed up querying of those values.
Easiest way tho is a Composite Primary Key which is a primary key made from two or more columns. For example:
CREATE TABLE userdata (
userid INT,
userdataid INT,
info char(200),
primary key (userid, userdataid),
);
Here is more info: Link
You can have a Composite Primary Key which is a primary key made from two or more columns. For example:
CREATE TABLE userdata (
userid INT,
userdataid INT,
info char(200),
primary key (userid, userdataid),
);
A table can have multiple candidate keys. Each candidate key is a column or set of columns that are UNIQUE, taken together, and also NOT NULL. Thus, specifying values for all the columns of any candidate key is enough to determine that there is one row that meets the criteria, or no rows at all.
Candidate keys are a fundamental concept in the relational data model.
It's common practice, if multiple keys are present in one table, to designate one of the candidate keys as the primary key. It's also common practice to cause any foreign keys to the table to reference the primary key, rather than any other candidate key.
I recommend these practices, but there is nothing in the relational model that requires selecting a primary key among the candidate keys.

Performance difference between foreign key identifying and non-identifying relationships

I was just adding some foreign keys to my database and usually all my foreign keys are non identifying as I have never bothered making them identifying as I never knew the difference and my databases always seemed to work well enough for me.
Now I have decided that I am going to make this database properly and was making the foreign keys identifying and non-identifying. I was curious is there any performance difference with them when doing Joins?
Thanks
Yes, there could be some performance benefit to joins by making a foreign key on an identifying relationship. But it depends on the query (as optimization methods always do).
For example, querying the books for a given author:
SELECT a.author_name, b.book_name
FROM Authors AS a
JOIN AuthorBooks AS ab ON a.author_id = ab.author_id
JOIN Books AS b ON b.book_id = ab.book_id
WHERE a.author_id = 12345;
In this case, we hope the join to AuthorBooks uses an index. Which index will it use? It depends on how we define the indexes in that table.
The two entity tables are pretty straightforward.
CREATE TABLE Authors (
author_id INT AUTO_INCREMENT PRIMARY KEY,
author_name VARCHAR(50)
);
CREATE TABLE Books (
book_id INT AUTO_INCREMENT PRIMARY KEY,
book_name VARCHAR(50)
);
But there are two common ways that developers design the many-to-many table. One has an auto-increment id for its primary key:
CREATE TABLE AuthorBooks (
id INT AUTO_INCREMENT PRIMARY KEY,
author_id INT NOT NULL,
book_id INT NOT NULL,
FOREIGN KEY (author_id) REFERENCES Authors (author_id)
FOREIGN KEY (book_id) REFERENCES Books (book_id)
);
The other does not have an id. The primary key is the combination of the two foreign keys, and this makes them both have an identifying relationship with their respective referenced entity tables.
CREATE TABLE AuthorBooks (
author_id INT NOT NULL,
book_id INT NOT NULL,
PRIMARY KEY (author_id, book_id),
FOREIGN KEY (author_id) REFERENCES Authors (author_id)
FOREIGN KEY (book_id) REFERENCES Books (book_id)
);
What's the difference in terms of performance?
First of all, keep in mind how MySQL implements indexes for foreign keys: If there's no index, the foreign key will implicitly create one. If there's an index already on the column, the foreign key will use it. Even an index that includes the foreign key column as the left-most column, that can be used, and there is no need to create a new index for the foreign key.
In the first AuthorBooks table design, as MySQL does the join from Authors to AuthorBooks, it looks up an entry in the index for the author_id foreign key. But to perform the second join, that index entry has to fetch the row it references, to get the book_id value, which it then uses to join to the Books table. So the joins ultimately take an extra table lookup.
In the second AuthorBooks table design, the author_id is indexed by the PRIMARY KEY of the table. So as the join does a lookup to the author_id, it comes with access to the matching book_id, without an extra lookup to the table. The book_id can then be used for the second join. This eliminates a step for each row found by the query.
This turns out to be a great benefit for performance. I have optimized some queries simply by making a many-to-many table use a covering index like this—whether by using the primary key or creating an extra two-column index on the two foreign keys—and this resulted in up to six orders of magnitude improvement for performance.
The answer by #billKarwin is really good. I would just add one observation.
Identifying and non-identifying relationships are logical constructs. They model the underlying business domain - see this question (also answered by the ubiquitous #billKarwin). The reason to use logical constructs like this is to make the database easier to understand (and therefore maintain, extend, etc.). It's not to make your database "faster".