Priority of primary key in multiple column index? - mysql

I'm new to sql and now working with MySQL.
I'm going through the concept of indexes and I'm not sure what would happen in the following case:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
I have read that here, last_name or (last_name,first_name) can be used for lookup where as first_name cannot be used for lookup directly (not a leftmost index).
I have also read that PRIMARY KEY and UNIQUE KEY are indexed automatically. So, in my case where the id index comes? Don't it come as a leftmost prefix.
select * from test
where id=xxx and last_name==xxxx
will this call for a look up or searches the entire database?

First, your query is redundant. The id comparison is sufficient.
The optimizer is going to recognize that two indexes can be used for the query. I'm pretty sure that MySQL will choose the primary key index, because it is unique and clustered. Hence, it is obviously the correct one.
If neither index is unique or a primary key, then MySQL will resort to statistics about the indexes (or arbitrarily choosing one of them). You can read about index statistics in the documentation.

Related

Composite keys and unique constrains performances and alternatives

I'm creating a database using MySQL for a music streaming application for my school project. It has a table "song_discoveries" which has these columns: user_id, song_id and discovery_date. It has no primary key. The "user_id" and "song_id" are foreign keys and the "discovery_date" is self explanatory. My problem is that I want to ensure that there are no duplicate rows in this table since obviously a user can discover a song once, but I'm not sure on whether to use a unique constraint for all of the columns or create a composite primary key of all columns. My main concerns are what is the best practice for this and which has better performance? Are there any alternatives to these approaches?
In MySQL, a table is stored as a clustered index sorted by the primary key.
If the table has no primary key, but does have a unique key constraint on non-NULL columns, then the unique key becomes the clustered index, and acts almost exactly the same as a primary key. There's no performance difference between these two cases.
The only difference between a primary and a unique key on non-NULL columns is that you can specify the name of the unique key, but the primary key is always called PRIMARY.
If the goal is to create "no duplicate rows in this table". Then to do this, you need to identify what makes a "unique" record. If uniqueness is guaranteed by
the composite user_id, discovery_date and song_id that that should be your primary composite key.
Thinking a bit more, if we apply a rule that says, "a song can only be discovered once !" then your composite primary key should be user_id,song_id (this will guarantee that you don't add the same song multiple times), but
if you can discover the same song on multiple days, then you can leave the key as the composition of the 3 fields.
If you go with user/song then a table can look like this:
CREATE TABLE song_discoveries (
user_id int NOT NULL,
song_id int NOT NULL,
discovery_date DATE NOT NULL,
PRIMARY KEY (user_id, song_id)
);

What is the purpose of using clustered keywords after primary key in MySQL

This is driving me nuts, can someone tell me what is the purpose of using clustered keywords after primary key in MySQL? In which condition I have to use it? Does it depend on the primary key data type?
Example:
create table Orders
(
OrderID int not null auto_increment,
CustID smallint not null, -- FK Customers table
EmpID smallint not null, -- FK Employees table
constraint pk_Orders primary key clustered (OrderID asc)
);
Tnx in advance.
SQL will create an index on default on primary key column.
Index is used similairly as you use index in a book - you want to look-up something in a book, you look at an index to see where it occurs in a book.
Obviously, you could find that without an index, but it would be extremely slow.
So, index is something, that will speed up queries.
A clustered index (SQL Server, MySQL/InnoDB) is a table stored in an index B-Tree structure. There is no second data structure (heap-table) for the table.
Non-clustered index has no effect on how data is stored, it just has informations where to find particular row based on indexed column.
More can be found here: Clustered Index / Non-Clustered Index

sql management studio [duplicate]

At work we have a big database with unique indexes instead of primary keys and all works fine.
I'm designing new database for a new project and I have a dilemma:
In DB theory, primary key is fundamental element, that's OK, but in REAL projects what are advantages and disadvantages of both?
What do you use in projects?
EDIT: ...and what about primary keys and replication on MS SQL server?
What is a unique index?
A unique index on a column is an index on that column that also enforces the constraint that you cannot have two equal values in that column in two different rows. Example:
CREATE TABLE table1 (foo int, bar int);
CREATE UNIQUE INDEX ux_table1_foo ON table1(foo); -- Create unique index on foo.
INSERT INTO table1 (foo, bar) VALUES (1, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (2, 2); -- OK
INSERT INTO table1 (foo, bar) VALUES (3, 1); -- OK
INSERT INTO table1 (foo, bar) VALUES (1, 4); -- Fails!
Duplicate entry '1' for key 'ux_table1_foo'
The last insert fails because it violates the unique index on column foo when it tries to insert the value 1 into this column for a second time.
In MySQL a unique constraint allows multiple NULLs.
It is possible to make a unique index on mutiple columns.
Primary key versus unique index
Things that are the same:
A primary key implies a unique index.
Things that are different:
A primary key also implies NOT NULL, but a unique index can be nullable.
There can be only one primary key, but there can be multiple unique indexes.
If there is no clustered index defined then the primary key will be the clustered index.
You can see it like this:
A Primary Key IS Unique
A Unique value doesn't have to be the Representaion of the Element
Meaning?; Well a primary key is used to identify the element, if you have a "Person" you would like to have a Personal Identification Number ( SSN or such ) which is Primary to your Person.
On the other hand, the person might have an e-mail which is unique, but doensn't identify the person.
I always have Primary Keys, even in relationship tables ( the mid-table / connection table ) I might have them. Why? Well I like to follow a standard when coding, if the "Person" has an identifier, the Car has an identifier, well, then the Person -> Car should have an identifier as well!
Foreign keys work with unique constraints as well as primary keys. From Books Online:
A FOREIGN KEY constraint does not have
to be linked only to a PRIMARY KEY
constraint in another table; it can
also be defined to reference the
columns of a UNIQUE constraint in
another table
For transactional replication, you need the primary key. From Books Online:
Tables published for transactional
replication must have a primary key.
If a table is in a transactional
replication publication, you cannot
disable any indexes that are
associated with primary key columns.
These indexes are required by
replication. To disable an index, you
must first drop the table from the
publication.
Both answers are for SQL Server 2005.
The choice of when to use a surrogate primary key as opposed to a natural key is tricky. Answers such as, always or never, are rarely useful. I find that it depends on the situation.
As an example, I have the following tables:
CREATE TABLE toll_booths (
id INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
...
UNIQUE(name)
)
CREATE TABLE cars (
vin VARCHAR(17) NOT NULL PRIMARY KEY,
license_plate VARCHAR(10) NOT NULL,
...
UNIQUE(license_plate)
)
CREATE TABLE drive_through (
id INTEGER NOT NULL PRIMARY KEY,
toll_booth_id INTEGER NOT NULL REFERENCES toll_booths(id),
vin VARCHAR(17) NOT NULL REFERENCES cars(vin),
at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
amount NUMERIC(10,4) NOT NULL,
...
UNIQUE(toll_booth_id, vin)
)
We have two entity tables (toll_booths and cars) and a transaction table (drive_through). The toll_booth table uses a surrogate key because it has no natural attribute that is not guaranteed to change (the name can easily be changed). The cars table uses a natural primary key because it has a non-changing unique identifier (vin). The drive_through transaction table uses a surrogate key for easy identification, but also has a unique constraint on the attributes that are guaranteed to be unique at the time the record is inserted.
http://database-programmer.blogspot.com has some great articles on this particular subject.
There are no disadvantages of primary keys.
To add just some information to #MrWiggles and #Peter Parker answers, when table doesn't have primary key for example you won't be able to edit data in some applications (they will end up saying sth like cannot edit / delete data without primary key). Postgresql allows multiple NULL values to be in UNIQUE column, PRIMARY KEY doesn't allow NULLs. Also some ORM that generate code may have some problems with tables without primary keys.
UPDATE:
As far as I know it is not possible to replicate tables without primary keys in MSSQL, at least without problems (details).
If something is a primary key, depending on your DB engine, the entire table gets sorted by the primary key. This means that lookups are much faster on the primary key because it doesn't have to do any dereferencing as it has to do with any other kind of index. Besides that, it's just theory.
In addition to what the other answers have said, some databases and systems may require a primary to be present. One situation comes to mind; when using enterprise replication with Informix a PK must be present for a table to participate in replication.
As long as you do not allow NULL for a value, they should be handled the same, but the value NULL is handled differently on databases(AFAIK MS-SQL do not allow more than one(1) NULL value, mySQL and Oracle allow this, if a column is UNIQUE)
So you must define this column NOT NULL UNIQUE INDEX
There is no such thing as a primary key in relational data theory, so your question has to be answered on the practical level.
Unique indexes are not part of the SQL standard. The particular implementation of a DBMS will determine what are the consequences of declaring a unique index.
In Oracle, declaring a primary key will result in a unique index being created on your behalf, so the question is almost moot. I can't tell you about other DBMS products.
I favor declaring a primary key. This has the effect of forbidding NULLs in the key column(s) as well as forbidding duplicates. I also favor declaring REFERENCES constraints to enforce entity integrity. In many cases, declaring an index on the coulmn(s) of a foreign key will speed up joins. This kind of index should in general not be unique.
There are some disadvantages of CLUSTERED INDEXES vs UNIQUE INDEXES.
As already stated, a CLUSTERED INDEX physically orders the data in the table.
This mean that when you have a lot if inserts or deletes on a table containing a clustered index, everytime (well, almost, depending on your fill factor) you change the data, the physical table needs to be updated to stay sorted.
In relative small tables, this is fine, but when getting to tables that have GB's worth of data, and insertrs/deletes affect the sorting, you will run into problems.
I almost never create a table without a numeric primary key. If there is also a natural key that should be unique, I also put a unique index on it. Joins are faster on integers than multicolumn natural keys, data only needs to change in one place (natural keys tend to need to be updated which is a bad thing when it is in primary key - foreign key relationships). If you are going to need replication use a GUID instead of an integer, but for the most part I prefer a key that is user readable especially if they need to see it to distinguish between John Smith and John Smith.
The few times I don't create a surrogate key are when I have a joining table that is involved in a many-to-many relationship. In this case I declare both fields as the primary key.
My understanding is that a primary key and a unique index with a not‑null constraint, are the same (*); and I suppose one choose one or the other depending on what the specification explicitly states or implies (a matter of what you want to express and explicitly enforce). If it requires uniqueness and not‑null, then make it a primary key. If it just happens all parts of a unique index are not‑null without any requirement for that, then just make it a unique index.
The sole remaining difference is, you may have multiple not‑null unique indexes, while you can't have multiple primary keys.
(*) Excepting a practical difference: a primary key can be the default unique key for some operations, like defining a foreign key. Ex. if one define a foreign key referencing a table and does not provide the column name, if the referenced table has a primary key, then the primary key will be the referenced column. Otherwise, the the referenced column will have to be named explicitly.
Others here have mentioned DB replication, but I don't know about it.
Unique Index can have one NULL value. It creates NON-CLUSTERED INDEX.
Primary Key cannot contain NULL value. It creates CLUSTERED INDEX.
In MSSQL, Primary keys should be monotonically increasing for best performance on the clustered index. Therefore an integer with identity insert is better than any natural key that might not be monotonically increasing.
If it were up to me...
You need to satisfy the requirements of the database and of your applications.
Adding an auto-incrementing integer or long id column to every table to serve as the primary key takes care of the database requirements.
You would then add at least one other unique index to the table for use by your application. This would be the index on employee_id, or account_id, or customer_id, etc. If possible, this index should not be a composite index.
I would favor indices on several fields individually over composite indices. The database will use the single field indices whenever the where clause includes those fields, but it will only use a composite when you provide the fields in exactly the correct order - meaning it can't use the second field in a composite index unless you provide both the first and second in your where clause.
I am all for using calculated or Function type indices - and would recommend using them over composite indices. It makes it very easy to use the function index by using the same function in your where clause.
This takes care of your application requirements.
It is highly likely that other non-primary indices are actually mappings of that indexes key value to a primary key value, not rowid()'s. This allows for physical sorting operations and deletes to occur without having to recreate these indices.

Can I add a compound unique key if one of the fields is already unique

In MySQL, does following statement make sense?
CREATE TABLE `sku_classification` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sku` int(10) unsigned NOT NULL,
`business_classification_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `IDX_SKU_BUSINESS_CLASSIFICATION` (`sku`,`business_classification_id`),
UNIQUE KEY `sku` (`sku`)
)
Is it an unnecessary overkill to add a unique key on a combination of fields (sku,business_classification_id), one of which (sku) already has unique index on it? Or is it not, and there is indeed some reason for such duplicate unique index?
Yes, you can. But it does not make sense. But, let's analyze what is going on.
An INDEX (UNIQUE or not) is a BTree that facilitates lookups in the table.
A UNIQUE index is both an index and a "constraint" saying that there shall not be any duplicates.
You have already said UNIQUE(sku). This provides both an index an a uniqueness constraint.
Adding UNIQUE(sku, x) in that order:
Does not provide any additional uniqueness constraint,
Does not provide any additional indexing capability, except...
Does provide a "covering" index that could be useful if the only two columns mentioned in a SELECT were sku and x. Even so, you may as make it an INDEX not a UNIQUE, because...
Every INSERT must do some extra effort to prevent "duplicate key". (OK, the INSERT code is not smart enough to see that you have UNIQUE(sku).)
If that is your complete table, there is no good reason to have the id AUTO_INCREMENT; you may as well promote sku to be the PRIMARY KEY. (A PK is a UNIQUE KEY.)
Furthermore... If, on the other hand, you were suggesting UNIQUE(x, sku), then there is one slight difference. This provides you a way to efficiently lookup by x -- a range of x, or x=constant AND sku BETWEEN ..., or certain other thing that are not provided by (sku, x). Order matters in an index. But, again, it may as well be INDEX(x, sku), not UNIQUE.
So, the optimal set of indexes for the table as presented is not 3 indexes, but 1:
PRIMARY KEY(sku)
One more note: With InnoDB, the PK is "clustered" in BTree with the data. That is, looking up by the PK is very efficient. When you need to go through a "secondary index", there are two steps: first drill down the secondary index's BTree to find the PK, then drill down the PK's BTree.

is there any difference with joint primary key order?

I am curious about that , is there any difference with joint primary key order?
For example, is there any difference between the two tables' primary key? the key order would make no difference on the table?
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`user_id`,`retweet_id`)
)
vs
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`retweet_id`,`user_id`)
)
It would make difference in an index structure.
In composite index the index value consists of several values that go one after another. And the order determines what queries can be optimized using this particular index.
IE:
For the index created as
PRIMARY KEY (`user_id`,`retweet_id`)
The query like WHERE user_id = 42 will be optimized (not guaranteed, but technically possible), whereas for the query WHERE retweet_id = 4242 it won't be.
PS: it's a good idea to always have an artificial primary key, like a sequence (or an autoincrement column in case of mysql), instead of using natural primary keys. It would be better because the primary key is a clustered key, which means it defines how rows are physically stored in pages on disk. Which means it's a good idea for a PK to be monotonously growing (or decreasing, doesn't matter)
The order does affect how the index is used in queries. When you use multiple columns, each column is a sub-tree of the preceding column.
In your first case (user_id, retweet_id) - if you searched the index for user_id 1, you then have all the retweet_ids under that.
Subsequently if you wish to search for only retweet_id=7 (for all users) - the index cannot be used because you need to first step through each users item in the index.
So if you wish to query for user_id, or retweet_id individually (without the other), put that column first. If you need both you could consider adding a secondary index.
There are also limitations for range scans, you can only effectively use the last column queried for the range scan. You can read more about all of this here:
http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html
Additionally if using InnoDB, the tables are stored in order of the PRIMARY KEY. This might matter for performance depending on how you query your data.