Index a mysql table of 3 integer fields - mysql

I have a mysql table of 3 integer fields. None of the fields have a unique value - but the three of them combined are unique.
When I query this table, I only search by the first field.
Which approach is recommended for indexing such table?
Having a multiple-field primary key on the 3 fields, or setting an index on the first field, which is not unique?
Thanks,
Doori Bar

Both. You'll need the multi-field primary key to ensure uniqueness, and you'll want the index on the first field for speed during searches.

You can have a UNIQUE Constraint on the three fields combined to meet your data quality standards. If you are primarily searching by Field1 then you should have an index on it.
You should also consider how you JOIN this table.
Your indexes should really support the bigger workload first - you will have to look at the execution plan to determine what suits you best.

The primary key will prevent your application from accidenttly inserting dupe rows. You probably want that.
Order the columns in the PK correctly though or make an index on the first column clustered for better performance. Compare how the query runs (with the PK present) and with and without the index on the first column.

If you're using InnoDB, you must have a clustered index. If you don't specify one, MySQL will use one in the background anyway. So, you may as well use a clustered (unique) primary key by combining all three columns.
The primary key will also then prevent duplicates, which is a bonus.
If you're returning all three integer fields, then you'll have a covered index, which means that the database won't even have to touch the actual record. It will get everything it needs right from the index.
The only caveat would be inserts (and appends). Updating a clustered index, especially on multiple columns, does have some performance penalization. It will be up to you to test and determine the best approach.

Related

Is there a write performance improvement to using INDEX instead unique indexes?

This describes different indexes:
KEY or INDEX refers to a normal non-unique index. Non-distinct values
for the index are allowed, so the index may contain rows with
identical values in all columns of the index. These indexes don't
enforce any restraints on your data so they are used only for making
sure certain queries can run quickly.
UNIQUE refers to an index where all rows of the index must be unique.
That is, the same row may not have identical non-NULL values for all
columns in this index as another row. As well as being used to speed
up queries, UNIQUE indexes can be used to enforce restraints on data,
because the database system does not allow this distinct values rule
to be broken when inserting or updating data.
I understand the benefit to application logic (you don't want uniqueness check) but is there also a performance improvement? Specifically, how much faster are writes using INDEX instead of UNIQUE?
UNIQUE KEY is a constraint, and you use it when you want to enforce that constraint.
KEY is an index, which you pick to make certain queries more efficient.
The performance of inserting into a table with either type of index is virtually the same. That is, the difference, if any, is so minor that it's not worth picking one over the other for the sake of performance.
Choose the type of index to support your constraints. Use UNIQUE KEY if and only if you want to enforce uniqueness. Use KEY otherwise.
Your question is like asking, "which is faster, a motorcycle or a speedboat?" They are used in different situations, so judging them on their speed isn't the point.
INSERT
When a row is inserted, all unique keys (PRIMARY and UNIQUE) are immediately checked for duplicate keys. This is so that you get an error on the INSERT if necessary. The updating of non-unique INDEXes is delayed (for discussion, see "Change buffering"). The work will be done in the background so your INSERT won't be waiting for it.
So, there is a slight overhead in UNIQUE for inserting. But, as already pointed out, if you need the uniqueness constraint, then use it.
SELECT
Any kind of index (PRIMARY, UNIQUE, or INDEX) may be used to speed up a SELECT. Mostly, the types of index work identically. However with PRIMARY and UNIQUE, the optimizer can know that there will only one (or possibly zero) rows matching a given value, so it can fetch the one row, then quit. For a non-unique index, there could be more than one row, so it keeps scanning the index, checking for more rows. This scan stops after peeking at the first non-matching row. So, there is a small (very small) overhead for non-unique indexes versus unique.
Bottom Line
The performance issues are less important than the semantics (uniqueness constraint vs. not).

Is it unnecessary to create additional indexes?

If I have a unique key on a column, would that make it unnecessary to have an additional index on the same column?
What if it is a multi-column unique key? Would that make it unnecessary to have an additional index on the FIRST column in the unique index column list?
There are similar questions on SO, but I wanted to get a definitive answer in this narrower context.
If I have a unique key on a column, would that make it unnecessary to have an additional index on the same column?
Yes. UNIQUE KEY implicitly creates an index.
What if it is a multi-column unique key? Would that make it unnecessary to have an additional index on the FIRST column in the unique index column list?
Yes. Any query that needs to search the first column can use the multi-column index.
By analogy, if I ask you to search the telephone book for people named "Smith, Sarah" it's like you search a compound index with two columns. If I ask you to use the same telephone book to search for people with last name "Smith" the sort order of the first column still helps, even though you don't need to search the second column.
would that make it unnecessary to have an additional index on the same
column?
Yes.
What if it is a multi-column unique key?
Yes, it is fine to have few more indexes along with unique indexes.
==Detailed Explanation==
You should keep in mind that each time you perform DML on your table. It has to restructure your indexes. So, this totally depends, on what are you are achieving and what you are ready to compromise.
Assuming, you are working of some database, whose query performance (SELECT) is slow. Now, you thought of creating index, which eventually improved some performance. So, you thought, it was necessary to have this index.
Later, after some day, you started observing slowness when performing any DML operations on same table. Now, depending on what you are ready to comprise. This can be defined as (un)necessary for you.
8.3 Optimization and Indexes
Although it can be tempting to create an indexes for every possible
column used in a query, unnecessary indexes waste space and waste time
for MySQL to determine which indexes to use. Indexes also add to the
cost of inserts, updates, and deletes because each index must be
updated. You must find the right balance to achieve fast queries using
the optimal set of indexes.

SQL - Primary vs Unique not null [duplicate]

What are the differences between PRIMARY, UNIQUE, INDEX and FULLTEXT when creating MySQL tables?
How would I use them?
Differences
KEY or INDEX refers to a normal non-unique index. Non-distinct values for the index are allowed, so the index may contain rows with identical values in all columns of the index. These indexes don't enforce any restraints on your data so they are used only for access - for quickly reaching certain ranges of records without scanning all records.
UNIQUE refers to an index where all rows of the index must be unique. That is, the same row may not have identical non-NULL values for all columns in this index as another row. As well as being used to quickly reach certain record ranges, UNIQUE indexes can be used to enforce restraints on data, because the database system does not allow the distinct values rule to be broken when inserting or updating data.
Your database system may allow a UNIQUE index to be applied to columns which allow NULL values, in which case two rows are allowed to be identical if they both contain a NULL value (the rationale here is that NULL is considered not equal to itself). Depending on your application, however, you may find this undesirable: if you wish to prevent this, you should disallow NULL values in the relevant columns.
PRIMARY acts exactly like a UNIQUE index, except that it is always named 'PRIMARY', and there may be only one on a table (and there should always be one; though some database systems don't enforce this). A PRIMARY index is intended as a primary means to uniquely identify any row in the table, so unlike UNIQUE it should not be used on any columns which allow NULL values. Your PRIMARY index should be on the smallest number of columns that are sufficient to uniquely identify a row. Often, this is just one column containing a unique auto-incremented number, but if there is anything else that can uniquely identify a row, such as "countrycode" in a list of countries, you can use that instead.
Some database systems (such as MySQL's InnoDB) will internally store a table's actual records within the PRIMARY KEY's B-tree index.
FULLTEXT indexes are different from all of the above, and their behaviour differs significantly between database systems. FULLTEXT indexes are only useful for full text searches done with the MATCH() / AGAINST() clause, unlike the above three - which are typically implemented internally using b-trees (allowing for selecting, sorting or ranges starting from left most column) or hash tables (allowing for selection starting from left most column).
Where the other index types are general-purpose, a FULLTEXT index is specialised, in that it serves a narrow purpose: it's only used for a "full text search" feature.
Similarities
All of these indexes may have more than one column in them.
With the exception of FULLTEXT, the column order is significant: for the index to be useful in a query, the query must use columns from the index starting from the left - it can't use just the second, third or fourth part of an index, unless it is also using the previous columns in the index to match static values. (For a FULLTEXT index to be useful to a query, the query must use all columns of the index.)
All of these are kinds of indices.
primary: must be unique, is an index, is (likely) the physical index, can be only one per table.
unique: as it says. You can't have more than one row with a tuple of this value. Note that since a unique key can be over more than one column, this doesn't necessarily mean that each individual column in the index is unique, but that each combination of values across these columns is unique.
index: if it's not primary or unique, it doesn't constrain values inserted into the table, but it does allow them to be looked up more efficiently.
fulltext: a more specialized form of indexing that allows full text search. Think of it as (essentially) creating an "index" for each "word" in the specified column.
I feel like this has been well covered, maybe except for the following:
Simple KEY / INDEX (or otherwise called SECONDARY INDEX) do increase performance if selectivity is sufficient. On this matter, the usual recommendation is that if the amount of records in the result set on which an index is applied exceeds 20% of the total amount of records of the parent table, then the index will be ineffective. In practice each architecture will differ but, the idea is still correct.
Secondary Indexes (and that is very specific to mysql) should not be seen as completely separate and different objects from the primary key. In fact, both should be used jointly and, once this information known, provide an additional tool to the mysql DBA: in Mysql, indexes embed the primary key. It leads to significant performance improvements, specifically when cleverly building implicit covering indexes such as described there.
If you feel like your data should be UNIQUE, use a unique index. You may think it's optional (for instance, working it out at application level) and that a normal index will do, but it actually represents a guarantee for Mysql that each row is unique, which incidentally provides a performance benefit.
You can only use FULLTEXT (or otherwise called SEARCH INDEX) with Innodb (In MySQL 5.6.4 and up) and Myisam Engines
You can only use FULLTEXT on CHAR, VARCHAR and TEXT column types
FULLTEXT index involves a LOT more than just creating an index. There's a bunch of system tables created, a completely separate caching system and some specific rules and optimizations applied. See http://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html and http://dev.mysql.com/doc/refman/5.7/en/innodb-fulltext-index.html

MySQL Indexing, Different types and when to use them?

Ok, So I have (re)searched a lot regarding MySQL Indexes and their importance and I understand i have to use it to make database queries run faster.,
And i know the syntax to add index on any filed.
But here's what i don't understand, ( I am using Heidi SQL to manage database locally )
I have a table with the following fields
id
company_id
author_id
client_id
project_id
title
description
status
date
Here, id is the primary key, auto incremented and is already indexed.
I want to add an index on company_id, author_id, client_id, project_id, But i have few different options here (Heidi SQL: select field, right click, create new index ) Key, Unique, Full Text, Spatial
I know (guess) Key will just index the field, Unique will make sure that the field has to be unique and Full Text index will be best if i plan to perform search on that field.
Question 1 : What does Spatial do, where and when should i use this index.
Question 2 : While creating and index, I have 2 options, Either create a new index or Add to an already existing index (like the name of primary field).
What's the difference between creating a new index OR adding to an already existing index? Is creating an index under different name for every different field i index a good idea or should i create/add all the indexes under the same name.
Thanks for you time.
You should start with a simple Key index. If you add Unique that can have unintended consequences, for example enforcing uniqueness on a column like company_id would definitely not work (lots of rows will share the same company_id).
A Spatial index is only for geo-coding data (latitude/longitude values), so this does not apply to any of these. Fulltext index is used when you want to search by words within the fields, it also will not apply to numeric values that you have. Further, a Fulltext index is only available in MyISAM, not the INNODB transactional engine, so if you ever want to migrate the table this would be a barrier to that.
If you add to an existing index, you are making a "compound" index on more than one column. Typically this is not a good idea, unless you are trying to enforce uniqueness in an index on a column that is not already unique. For example, you could index: company_id + id, and make it unique in that case. However, the index takes more space and will be slower on writes.
In summary, you should just use a normal KEY index on the columns you want to search by. MySQL will only use one of the indexes for each query, to see how it is actually accessing the data you can use the EXPLAIN utility. You want to make sure it is using an index in an effective way to narrow down the number of rows it needs to return or search for best performance.

Using Primary Keys as Index

In my application I usually use my primary keys as a way to access data. However, I've been told in order to increase performance, I should index columns in my table. But I have no idea what columns to index.
Now the Questions:
Is it a good idea to create an index on your primary key?
How would I know what columns to index?
Is it a good idea to create an index on your primary key?
Primary keys are implemented using a unique index automatically in Postgres. You are done here.
The same is true for MySQL. See:
Is the primary key automatically indexed in MySQL?
How would I know what columns to index?
For advice on additional indices, see:
Optimize PostgreSQL read-only tables
Again, the basics are the same for MySQL and Postgres. But Postgres has more advanced features like partial or functional indices if you need them. Start with the basics, though.
Your primary key will already have an index that is created for you automatically by PostgreSQL. You do not need to index the column again.
As far as the rest of the fields go, take a look at the article here on figuring out cardinality:
http://kirk.webfinish.com/2013/08/some-help-to-find-uniqueness-in-a-large-table-with-many-fields/
Fields that are completely unique are candidates, fields that have no uniqueness at all are useless to index. The sweet spot is the cardinality in the middle (.5).
And of course you should take a look at which columns you are using in the WHERE clause. It is useless to index columns that are not a part of your quals.
Primary keys will have an idex only if you formally define them as primary keys. Where most people forget to make indexes are Foriegn keys which are not generally automatically indexed and almost always will be involved in joins and thus indexed. Other candidates for indexes are things you frequently filter data on that have a large number fo possible values, things like names, part numbers, start Dates, etc.
1) Is it a good idea to make your primary key as an Index?(assuming the primary key is unique,an id
All DBMSes I know of will automatically create an index underneath the PK.
In case of MySQL/InnoDB, PK will not just be indexed, but that index will be clustered index.
(BTW, just saying "primary key" implies it is unique, so there is no need to explicitly state "assuming the primary key is unique".)
2) how would I know what columns to index ?
That depends on which queries need to be supported.
But beware that adding indexes is not free and is a matter of engineering tradeoff - while some queries might benefit from an index, some may actually suffer from it. For example:
An index on FOO would significantly speed-up the SELECT * FROM T WHERE FOO = ....
However, the same index would somewhat slow-down the INSERT INTO T VALUES (...).
In most situations you'd favor large speedup in SELECT over small slowdown in INSERT, but that may not always be the case.
Indexing and the database performance in general are a complex topic beyond the scope of a humble StackOverflow post, but if you are interested I warmly recommend reading Use The Index, Luke!.
Your primary key will always be an index.
Always create indexes in columns that help to reduce the search, for example if in the column there are only 3 different values ​​among more than a thousand it is a good sign to make it index.