What is the difference between an Index and a Foreign Key? - mysql

I want to create a database with 3 tables. One for posts and one for tags and one that links posts to tags with the post_id and tag_id functioning as foreign key references.
Can you explain what an Index would be in this scenario and how it differs from a Foreign Key and how that impacts my database design?

an index on a table is a data structure that makes random access to the rows fast and efficient. It helps to optimize the internal organization of a table as well.
A foreign key is simply a pointer to a corresponding column in another table that forms a referential constraint between the two tables.

An index is added as a fast look up for data in the table.
An index can have constraints, in that the column or columns that are used to make the index might have to be unique (unique: only one row in the database is returned for that index, or non-unique: multiple rows can be returned). The primary key for the table is a unique index, and usually only has one column.
A foreign key is a value in a table that references a unique index in another table. It is used as a way to relate to tables together. For example, a child table can look up the one parent row via its column that is a unique index in the parent table.

You'll have foreign keys in the third table. Indexes are not necessary, you need them if you have lots of data where you want to find something by Id quickly. Maybe you'll want an index on posts primary key, but DBMS will probably create it automatically.
Index is a redundant data structure which speeds up some queries.
Foreign key, for practical matters, is a way to make sure that you have no invalid pointers between the rows in your tables (in your case, from the relationship table to posts and tags)

Question: Can you explain what an Index would be in this scenario and how it differs from a Foreign Key and how that impacts my database design?
Your foreign keys in this case are the two columns in your Posts_Tags table. With a foreign key, Each foreign key column must contain a value from the main table it is referencing. In this case, the Posts and Tags tables.
Posts_Tags->PostID must be a value contained in Posts->PostID
Posts_Tags->TagID must be a value contained in Tags->TagID
Think of an index as a column that has been given increased speed and efficiency for querying/searching values from it, at the cost of increased size of your database. Generally, primary keys are indexes, and other columns that require querying/searching on your website, in your case, probably the name of a post (Posts->PostName)
In your case, indexes will have little impact on your design (they are nice to have for speed and efficiency), but your foreign keys are very important to avoid data corruption (having values in them that don't match a post and/or tag).

You describe a very common database construct; it's called a "many-to-many relation".
Indexes shouldn't impact this schema at all. In fact, indexes shouldn't impact any schema. Indexes are a trade-off between space and time: indexes specify that you're willing to use extra storage space, in exchange for faster searches through the database.
Wikipedia has an excellent article about what database indexes are: Index (database)

To use foreign keys in mysql, you need to create indexes on both tables. For example, if you want the field a_id on table b to reference the id field on the table a, you have to create indexes on both a.id and b.a_id before you can create the reference.
Update: here you can read more about it: http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html

Related

MySQL, Foreign Key vs Indexed ID column with NOT NULL

Lets say you have Users table and Posts table.
Users
id
name
email
Posts
id
contents
user_id
If I add index to "user_id" in Posts table, and set it as NOT NULL, Can I expect same effect as Foreign Key?
I know that I can set user_id as any number, whereas foreign_key will force you set valid id. Let's assume that user_id is valid. Is there any performance benefit when we set foreign_key?
The main benefit of foreign keys is that they enforce data consistency, meaning that they keep the database clean in other words Keys are Indexes that have Integrity rules applied to prevent corruption of data.
Index is a data structure built on columns of a table to speed up search for indexed records based on values of indexed columns. In other words you gain search speed in exchange of insert/delete speed and storage.
Is there any performance benefit when we set foreign_key?
In performance terms, you will face no improvement.
Foreign keys will impact INSERT, UPDATE and DELETE statements because of the data checking rules , but keep in mind that your data will be consistet .
In MySQL, defining a foreign key constraint automatically creates an index, unless it can use an index that already exists. That is, if you create an index and subsequently add a foreign key on the same column(s), MySQL does not create an extra index just for the foreign key.
If you run a query that needs that index, it doesn't matter if you created the index yourself or if the index was created as a side-effect of adding the foreign key. Either way, the index can help the query. The performance benefit is the same.
If you run a query that does not need that index, then there's no benefit to having index either way.
You didn't describe any specific SQL query, so there's no way for us to guess whether the index is needed.

How do I make a field reference another field which is not a primary key in MySQL?

I know that foreign keys need not reference only primary keys but they can also reference a field that has a unique constraint on it. For my scenario, I am setting up a quiz where for each test, I have a set of questions. My table design is like this
The point is, in my 2nd table where I will put all the answer options, I want the question number field to link to the first table question number. How do I do this? Or is there an alternative to this design?
Thank you
Ideally there should be a question_id primary key column in the test_question table, and you would use this as the foreign key in the test_answer table.
With your composite primary key in the test_question table, you should make a corresponding composite foreign key:
CONSTRAINT FOREIGN KEY (test_id, question_no) REFERENCES test_question (test_id, question_no)
This is in addition to the foreign key just for the test_id column.
Add another table purely for answers, and link them via the question_no field.
A DB table should hold information on one sort of item. Questions and answers are separate sorts of information so should be in separate tables. Adding a separate table also allows changes to questions and answers independently. Additionally, if they are separate, you could add a language field to each table and have a multi-lingual quiz
Short answer:
You can JOIN on any columns or expressions. There is no "requirement" for a FOREIGN KEY, PRIMARY KEY, UNIQUE, or anything else.
Long answer:
However,... For performance (in large tables), some things make a difference.
If you are JOINing to a PK, Unique key, or even an indexed column, the query cold run faster.
Why have a FOREIGN KEY? An FK is two things:
A "constraint" that says that the value must exist in the other table. Also, with things like ON DELETE CASCADE, it can provide actions to take if the indicated row is removed. The constraint requires looking in the other table each time a write occurs (eg INSERT).
An Index. That is, specifying a FK automatically adds an INDEX (if not already present) to make the constraint faster.
Getting the id
Here is the "usual" way to do a pair of inserts, where you need the second to 'point' to the first:
INSERT INTO t1 ... -- with an AUTO_INCREMENT id
grab LAST_INSERT_ID() -- that id
INSERT INTO t2 ... -- and include the id from above
For AUTO_INCREMENT to work it must be the first column of some key. (Note: a PRIMARY KEY is a UNIQUE is a key (aka INDEX).)
Optionally you can specify a FK on the second table to point out the connection between the tables.
And, as spelled out in other answers, a FK could involve more than one column.
Entities and Relations
Sometimes, a set of tables like yours is best 'designed' this way:
Determine the "entities": users, tests, questions, answers
Relations and whether they are 1:1, 1:many, or many:many... Users:test is many-to-many; tests:questions is 1:many (unless you want questions to be shared between tests).
Answers is more complex since each 1 answer depends on the user and question.
1:1 -- rarely practical; may as well merge the tables together.
1:many -- a link (FK?) in one table to the other.
many:many -- need a bridge table with (usually) 2 columns, namely ids linking to the two tables.

SQL Index on foreign key

When I join 2 tables tbl1, tbl2 on column1, where column1 is primary key on tbl1. Assuming that column1 is not automatically indexed should I create an index on both tbl1.column1 and tbl2.column1 or just on tbl2.column1. Are the number of rows of each table affect that choice?
A primary key is automatically indexed. There is no way around that (this is how the "unique" part of the unique constraint is implemented). So, tbl1.column1 has an index. No other index is needed.
As for tbl2.column2, you should probably have an index on that. MySQL does create an index if you explicitly declare a foreign key relationship. So, with the explicit declaration, no other index is necessary. Note: this is not true of all databases.
The presence of indexes does not change the results of queries nor the number of rows in the table, so I don't understand your final question. Indexes implement relational integrity and improve (hopefully!) performance on certain types of queries
Generally yes, because often you'll want to do the reverse of the join at some point.

What are keys used for in MySQL? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
mySQL's KEY keyword?
Like
PRIMARY KEY (ID),
KEY name (name),
KEY desc (desc),
etc.
what are they useful for?
Keys are used to enforce referential integrity in your database.
A primary key is, as its name suggests, the primary identification of a given row in your table. That is, each row's primary key will uniquely identify that row.
A unique key is a key that enforces uniqueness on that set of columns. It is similar to a primary key in that it will also uniquely identify a row in a table. However, there is the added benefit of allowing NULL in some of those combinations. There can only be 1 primary key, but you can have many unique keys.
A foreign key is used to enforce a relationship between 2 tables (think parent/child table). That way, a child table can not have a value of X in its parent column unless X actually appears in the parent table. This prevents orphaned records from appearing.
The primary key constraint ensures that the column(s) are:
not null
unique (unique sets if more than one column)
KEY is MySQL's terminology in CREATE TABLE statements for an index. Indexes are not ANSI currently, but all databases use indexes to speed up data retrieval (at the cost of insertion/update/deletion, because of maintenance to keep the index relevant).
There are other key constraints:
unique
foreign key (for referential integrity)
...but your question doesn't include examples of them.
keys are also called indexes. They are used for speeding up queries. Additionally keys can be constrains (unique key and foreign key). The primary key is also unique key and it identifies the records. The record can have other unique keys as well, that do not allow to duplicate a value in a given column. Foreign key enforces referential integrity (#Derek Kromm already wrote excellent description). The ordinary key is used only for speeding up queries. You need to index the columns used in the WHERE clause of the queries. If you have no index on the column, MySQL will need to read the whole table to find the records you need. When index is used, MySQL reads only the index (which is usually a B+ tree) and then read only those record from the table it found in the index.
Primary KEY is for creating unique/not null constraint for each row in the table. Also searching by this key is the fastest. You can create only one PK in the table.
Ordinary key/index is key for speeding your searching by this column, sorting, grouping and joining with other table by this key.
Indexes drawback:
Adding new indexes to table will influence on speed or running insert/update/delete statements. So you should select columns for indexing in your table very carefully.
Key are used for relation purposes between tables and you are able to create joins in order to select data from multiple tables
What, you didn't fine the wikipedia entry comprehensive? ;-)
So, a key, in a relational database (such as MySQL, PostgreSQL, Oracle, etc) is a data constraint on a column or set of columns. The most common keys are the Primary key and foreign keys and unique keys.
A foreign key specifically relates the data of one table to data in another table. You might see that a table blog_posts has a foreign key to users based on a user_id column. This means that every user_id in blog_posts will have a corresponding entry in the users column (this is a one-to-many relationship -- a topic for another time).
If a column (or group of columns) has a unique key, that means that there can only be one such incidence of the key in the table. Often you'll see things like email addresses be unique keys -- you only want one email address per user. I've also seen a combination of columns match to a unique key -- the five columns, first_name, last_name, address, city, and state, will often be a unique key -- realistically, there can only be one William Gates at 1835 73rd Ave NE, Medina, Washington. (I do realize that it is possible for a William Gates Jr. to be born, but the designers of that database didn't really care).
The primary key is the primary, unique identifier of a given table. By definition it is a unique key. It is something which cannot be null and must be unique. It holds a special place of prominence among the indexes of a given table.

RDBMS best practices - autoid for association table?

I have two tables, let's say they are called table A and table B. An item from table B can be present in multiple instances of A, and each A can contain multiple Bs so I have a table called a_b which links them together by their primary keys. My question is when I define this association table, should I have a primary key on the association table? Or is it not needed? Just trying to avoid ending up on TDWTF, that's all :)
The primary key would be on the table A PK column and table B PK column in your association table. That way, you ensure you don't get any duplicate rows in your association table by accident.
One of the main purposes of primary keys is to guarantee referential integrity. That is, keep the data in your table clean, with no duplicates. The PK in this case will ensure you never have 2 duplicate rows in the association table.
I think you might want to use a primary key in order to show your intent. If for example you do not want
a, b
a, b
Then a primary key defined on A.a and B.b would make that more clear. If you don't care, but you have a,b and other fields, then adding a surrogate key as your primary key might help in giving you a uniform way to delete a row that you do not want. Otherwise you will have to delete where a=a and b=b and ?? then pick some field value from the row you want deleted. Whereas with a surrogate key you can just pick the row and say delete where mykey = 36 or something...
But really it depends on the business case. Many intersect tables have some kind of date range, or additional fields related to the relationship in addition to the keys of the two tables. Defining a primary key on the existing columns, a new surrogate key, some unique indexes, some constraints, or even having no indexes could all be valid courses of action depending upon your needs.
I would say definitely do whatever makes your intentions the most clear.
Not needed. Both keys should form the primary key of your association table. If you're going to be doing bidirectional navigation, consider adding an index with the keys reversed.
The primary key is needed always.
However, I'd say it depends what should it be. If you are going to use some sort of ORM systems (e.g. Hibernate) then it is best to have a surrogate identifier, while those two foreign keys (pointing to tables A and B) should form a unique index.
Also, if there would ever be a need to reference such a relationship from another table then this surrogate identifier would be really handy.