I am planning to implement database search through a website - I know there is full-text search offered by mysql, but turns out that it is not supported for innodb engine (which I need for transaction support).
Other options are using sphinx or similar indexing applications. However they require some re factoring of the database structure and may take more time to implement than I have.
So what I decided on was to take each table and concatenate all its relevant columns into a newly added QUERY column. This query column should also recruit from column of other relevant tables.
This accomplished, I will use the 'like' clause on query column of the table to be searched to search to return results of specific domains (group of related tables).
Since my database is not expected to be too huge (< 1mn rows in the biggest table), I am expecting reasonable query times.
Does any one agree with this method or have a better idea?
You will not be happy with the solution of using LIKE with wildcards. It performs hundreds or thousands of times slower than using a fulltext search technology.
See my presentation Practical Full-Text Search in MySQL.
Instead of copying the values into a QUERY column, I would recommend copying the values into a MyISAM table where you have a FULLTEXT index defined. You could use triggers to do this.
You don't need to concatenate the values together, you just need the primary key column and each of your searchable text columns.
CREATE TABLE OriginalTable (
original_id SERIAL PRIMARY KEY,
author_id INT,
author_date DATETIME,
summary TEXT,
body TEXT
) ENGINE=InnoDB;
CREATE TABLE SearchTable (
original_id BIGINT UNSIGNED PRIMARY KEY, -- not auto-increment
-- author_id INT,
-- author_date DATETIME,
summary TEXT,
body TEXT,
FULLTEXT KEY (summary, body)
) ENGINE=MyISAM;
You'll want to add an index to your query column. If there is a wildcard at the beginning of the search expression, MySQL cannot use the index.
If you do any search other than "equals" (LIKE 'test') or "begins with" (LIKE 'test%'), MySQL will have to scan every row. For example, a "contains" search (LIKE '%test%') is unable to use the index.
You could allow an "ends with" ('LIKE %test), but you'd have to build a reversed column to index on so you could actually do LIKE 'test%' in order to use the index.
Any full scan is going to be slow, and the more rows, the slower it will be. The larger the field, the slower it will be.
You can see the limitation of using LIKE. Therefore, you might create a table called Tags, where you link individual key words to each entry rather than using the entire text, but I would still stick to "equals" and "begins with", even with tags.
Using LIKE without the aid of an index should be limited to the rare ad-hoc query or very small data sets.
No, it is not optimal since it force to read all the row. But, if you table is small (i don't know what is the meaning of <1mn) then it could be acceptable in some extend.
Also, you can limit the search feature. For example, some sites limit to use the search feature no more that one request x minute while other force you to enter a captcha.
Related
I have an application that basically functions like a grid where a user can view data and sort/filter by any of the columns. It's a very small amount of data (~200K rows / 50MB), but too big to comfortably fit in the browser and do it in javascript.
The crudest/simplest approach I've thought of is to store it in mysql table with an index on every single column (yes, every column). The database/table is about 99% read / 1% write, so I'm not too concerned if the insert times go up by 100x or so.
Is there any downside of doing the above? What kinds of things should I be concerned about? Are there any better (server-side) approaches for doing something like this?
If those are "words", toss them all into a single column, separated by spaces. Then add a single FULLTEXT index just on that column. If you have other non-'word' columns, you may need to index them, too.
Caveat: FULLTEXT has several limitations. And benefits (such as singular/plural).
Presumably, you will refuse to show anything unless the do some filtering? Don't tell me you want to let the user paginate through 200K rows!
You have not said whether they can filter on multiple columns.
You should construct the query in your app code. The syntax for fulltext is different than equality and different than range tests.
If you are tempted by the problematic EAV schema design, see http://mysql.rjweb.org/doc.php/eav
I have rambled on; I could be more focused if you gave us some clues of the data and the queries.
Example:
CREATE TABLE ... (
...
all_words TEXT NOT NULL,
LastUpdated DATETIME NOT NULL,
...
FULLTEXT(all_words)
)
SELECT ...
WHERE MATCH(all_words) AGAINST ('...' IN BOOLEAN MODE)
ORDER BY LastUpdated DESC
LIMIT 50;
(Note: Since only one index is used, and FULLTEXT has priority, an index on LastUpdated would not be useful for this example.)
I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?
It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.
Basically an index on a table works like an index in a book (that's where the name came from):
Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan).
On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).
Of course, how useful the index will be, depends on many things - a few examples, using the simile above:
if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.
The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.
There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.
InnoDB and the B+Tree index
For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.
The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.
So, given the following table:
CREATE TABLE person (
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
INDEX (last_name, first_name)
);
This query would take advantage of the index:
SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"
But the following one would not
SELECT last_name, first_name FROM person WHERE first_name = "Constantine"
Because you're querying the first_name column first and it's not the leftmost column in the index.
This last example is even worse:
SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"
Because now, you're comparing the rightmost part of the rightmost field in the index.
The hash index
This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >, < or LIKE.
Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.
If you have a big VARCHAR field, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hash and use a hash function like CRC32 or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:
SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");
The problem with the above example is that since the CRC32 function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:
SELECT url FROM url_table
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";
It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.
Unfortunately, using this technique, you still need to hit the table to compare the url field.
Wrap up
Some facts that you may consider every time you want to talk about optimization:
Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB.
Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECT by splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.
MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.
I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.
Basically an index is a map of all your keys that is sorted in order. With a list in order, then instead of checking every key, it can do something like this:
1: Go to middle of list - is higher or lower than what I'm looking for?
2: If higher, go to halfway point between middle and bottom, if lower, middle and top
3: Is higher or lower? Jump to middle point again, etc.
Using that logic, you can find an element in a sorted list in about 7 steps, instead of checking every item.
Obviously there are complexities, but that gives you the basic idea.
Take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
How they work is too broad of a subject to cover in one SO post.
Here is one of the best explanations of indexes I have seen. Unfortunately it is for SQL Server and not MySQL. I'm not sure how similar the two are...
In MySQL InnoDB, there are two types of index.
Primary key which is called clustered index. Index key words are stored with
real record data in the B+Tree leaf node.
Secondary key which is non clustered index. These index only store primary key's key words along with their own index key words in the B+Tree leaf node. So when searching from secondary index, it will first find its primary key index key words and scan the primary key B+Tree to find the real data records. This will make secondary index slower compared to primary index search. However, if the select columns are all in the secondary index, then no need to look up primary index B+Tree again. This is called covering index.
Take at this videos for more details about Indexing
Simple Indexing
You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table
CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);
You can use one or more columns to create an index. For example, we can create an index on tutorials_tbl using tutorial_author.
CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)
You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.
If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.
mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)
Adding some visual representation to the list of answers.
MySQL uses an extra layer of indirection: secondary index records point to primary index records, and the primary index itself holds the on-disk row locations. If a row offset changes, only the primary index needs to be updated.
Caveat: Disk data structure looks flat in the diagram but actually is a
B+ tree.
Source: link
I want to add my 2 cents. I am far from being a database expert, but I've recently read up a bit on this topic; enough for me to try and give an ELI5. So, here's may layman's explanation.
I understand it as such that an index is like a mini-mirror of your table, pretty much like an associative array. If you feed it with a matching key then you can just jump to that row in one "command".
But if you didn't have that index / array, the query interpreter must use a for-loop to go through all rows and check for a match (the full-table scan).
Having an index has the "downside" of extra storage (for that mini-mirror), in exchange for the "upside" of looking up content faster.
Note that (in dependence of your db engine) creating primary, foreign or unique keys automatically sets up a respective index as well. That same principle is basically why and how those keys work.
Let's suppose you have a book, probably a novel, a thick one with lots of things to read, hence lots of words.
Now, hypothetically, you brought two dictionaries, consisting of only words that are only used, at least one time in the novel. All words in that two dictionaries are stored in typical alphabetical order. In hypothetical dictionary A, words are printed only once while in hypothetical dictionary B words are printed as many numbers of times it is printed in the novel. Remember, words are sorted alphabetically in both the dictionaries.
Now you got stuck at some point while reading a novel and need to find the meaning of that word from anyone of those hypothetical dictionaries. What you will do? Surely you will jump to that word in a few steps to find its meaning, rather look for the meaning of each of the words in the novel, from starting, until you reach that bugging word.
This is how the index works in SQL. Consider Dictionary A as PRIMARY INDEX, Dictionary B as KEY/SECONDARY INDEX, and your desire to get for the meaning of the word as a QUERY/SELECT STATEMENT.
The index will help to fetch the data at a very fast rate. Without an index, you will have to look for the data from the starting, unnecessarily time-consuming costly task.
For more about indexes and types, look this.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.
Indexing adds a data structure with columns for the search conditions and a pointer
The pointer is the address on the memory disk of the row with the
rest of the information
The index data structure is sorted to optimize query efficiency
The query looks for the specific row in the index; the index refers to the pointer which will find the rest of the information.
The index reduces the number of rows the query has to search through from 17 to 4.
I am trying to understand indexes in MySQL. I know that an index created in a table can speed up executing queries and it can slow down the inserting and updating of rows.
When creating an index, I used this query on a table called authors that contains (AuthorNum, AuthorFName, AuthorLName, ...)
Create index Index_1 on Authors ([What to put here]);
I know I have to put a column name, but which one?
Do I have to put the column name that will be compared in the Where statement when a user query the Table or what?
The Anatomy of an Index
An index is a distinct data structure within a database and is data redundancy. Its primary purpose is to provide an ordered representation of the indexed data through a logical ordering which is independent of the physical ordering. We do this using a doubly linked list and a tree structure known as the balanced search tree (B-tree). B-trees are nice because they keep data sorted and allow searches, access, insertions, and deletions in logarithmic time. Because of the doubly linked list, we are able to go backwards or forwards as needed on the index for various queries easily. Inserts become simple since we only have to rearrange pointers to the different pieces of data. Databases use these doubly linked list to connect leaf nodes (usually in a B+ tree or B-tree), each of which are stored in a page, and to establish logical ordering between the leaf nodes. Operations like UPDATE or INSERT become slower because they are actually two writing operations in the filesystem (one for the table data and one for the index data).
Defining an Optimal Index With WHERE
To define an optimal index you must not only understand how indexes work, but you must also understand how the application queries the data. E.g., you must know the column combinations that appear in the WHERE clause.
A common restriction with queries on LAST_NAME and FIRST_NAME columns deals with case sensitivity. For example, instead of doing an exact search like Hotinger we would prefer to match all results such as HoTingEr and so on. This is very easy to do in a WHERE clause: we just say WHERE UPPER(LAST_NAME) = UPPER('Hotinger')
However, if we define an index of LAST_NAME and query, it will actually run a full table scan because the query is not on LAST_NAME but on UPPER(LAST_NAME). From the database's perspective, this is completely different. So, in this case you should define the index on UPPER(LAST_NAME) instead.
Indexes do not necessarily have to be for one column. For example, if the primary key is a composite key (consisting of multiple columns) it will create a concatenated index also known as a combined index. Note that the ordering of the concatenated index has a significant impact on its usability and scalability so it must be chosen carefully. Basically, the ordering should match the way it is ordered in the WHERE clause.
Defining an Optimal Index With LIKE
The position of the wildcard characters makes a huge difference. LIKE clauses only use the characters before the wildcard during tree traversal; the rest do not narrow the scanned index range. The more selective the prefix of the LIKE clause the more narrow the scanned index becomes. This makes the index lookup faster. As a tip, avoid LIKE clauses which lead with wildcards like "%OTINGER%" For full-text searches, MySQL offers MATCH and AGAINST keywords. Starting with MySQL 5.6, you can have full-text indexes. Look at Full-Text Search Functions from MySQL for more in-depth discussion on indexing these results.
Yes, generally you need an index on the column or columns that you compare in the WHERE clause of your queries to speed up queries.
If you search by AuthorFName, then you create an index on that column. If they search by AuthorLName, then you create an index on that column.
In this case though, maybe what you should be looking at is a FULLTEXT index. That would allow users to enter fuzzy queries, which would return a number of results ordered by relevance.
From the MySQL Manual:
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially. If you need to access most of the rows, it is faster to
read sequentially, because this minimizes disk seeks.
An index usually means a B-Tree. Understand the structure of the B-Tree and you'll understand what index can and cannot do.
In your particular case:
WHERE AuthorLName = 'something' and WHERE AuthorLName LIKE 'something%' can be sped-up by an index on {AuthorLName}.
WHERE AuthorLName = 'something AND AuthorFName = 'something else' can be sped-up by a composite index on {AuthorLName, AuthorFName} or {AuthorFName, AuthorLName}.
WHERE AuthorLName = 'something OR AuthorFName = 'something else' (which doesn't make much sense, but is here as an example) can be sped-up by having two indexes: on {AuthorLName} and on {AuthorFName}.
WHERE AuthorLName LIKE '%something' cannot be sped-up by a B-Tree index (cunsider full-text indexing).
Etc...
See Use The Index, Luke! for a much more thorough treatment of the subject than possible in a simple SO post.
Limited length index:
When using text columns or very large varchar columns you won't be able to create an index over the entire length of the text/varchar, there are some limits (around 1024 ASCII characters in length).
In such a case you specify the length in the index declaration.
CREATE INDEX `my_limited_length_index` ON `my_table`(`long_text_content`(512));
-- please notice the use of the numeric length of the index after the column name
Processed value index (apparently available in PostgreSQL not MySQL):
Indexes are not exclusively built from one column, some may be built from multiple columns and other may be built from just some of the info a column has. For example if you have a full datetime column but you know you're only going to filter records by date you can build an index based on the datetime column but only containing date info.
-- `my_table` has a `created` column of type timestamp
CREATE INDEX `my_date_created` ON `my_table`(DATE(`created`));
-- please notice the use of the DATE function which extracts only
-- the date from the `created` timestamp
index shall span the columns you are going to use in WHERE statement.
To better understand, here is an example:
SELECT * FROM Authors WHERE AuthorNum > 10 AND AuthorLName LIKE 'A%';
SELECT * FROM Authors WHERE AuthorLName LIKE 'Be%';
If you are often using the shown above queries, you are highly adviced to have two indexes:
Create index AuthNum_AuthLName_Index on Authors (AuthorNum, AuthorLName);
Create index AuthLName_Index on Authors (AuthorLName);
The key thing to remember: index shall have the same combiation of columns used in WHERE statements
I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?
It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.
Basically an index on a table works like an index in a book (that's where the name came from):
Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan).
On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).
Of course, how useful the index will be, depends on many things - a few examples, using the simile above:
if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.
The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.
There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.
InnoDB and the B+Tree index
For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.
The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.
So, given the following table:
CREATE TABLE person (
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
INDEX (last_name, first_name)
);
This query would take advantage of the index:
SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"
But the following one would not
SELECT last_name, first_name FROM person WHERE first_name = "Constantine"
Because you're querying the first_name column first and it's not the leftmost column in the index.
This last example is even worse:
SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"
Because now, you're comparing the rightmost part of the rightmost field in the index.
The hash index
This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >, < or LIKE.
Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.
If you have a big VARCHAR field, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hash and use a hash function like CRC32 or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:
SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");
The problem with the above example is that since the CRC32 function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:
SELECT url FROM url_table
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";
It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.
Unfortunately, using this technique, you still need to hit the table to compare the url field.
Wrap up
Some facts that you may consider every time you want to talk about optimization:
Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB.
Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECT by splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.
MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.
I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.
Basically an index is a map of all your keys that is sorted in order. With a list in order, then instead of checking every key, it can do something like this:
1: Go to middle of list - is higher or lower than what I'm looking for?
2: If higher, go to halfway point between middle and bottom, if lower, middle and top
3: Is higher or lower? Jump to middle point again, etc.
Using that logic, you can find an element in a sorted list in about 7 steps, instead of checking every item.
Obviously there are complexities, but that gives you the basic idea.
Take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
How they work is too broad of a subject to cover in one SO post.
Here is one of the best explanations of indexes I have seen. Unfortunately it is for SQL Server and not MySQL. I'm not sure how similar the two are...
In MySQL InnoDB, there are two types of index.
Primary key which is called clustered index. Index key words are stored with
real record data in the B+Tree leaf node.
Secondary key which is non clustered index. These index only store primary key's key words along with their own index key words in the B+Tree leaf node. So when searching from secondary index, it will first find its primary key index key words and scan the primary key B+Tree to find the real data records. This will make secondary index slower compared to primary index search. However, if the select columns are all in the secondary index, then no need to look up primary index B+Tree again. This is called covering index.
Take at this videos for more details about Indexing
Simple Indexing
You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table
CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);
You can use one or more columns to create an index. For example, we can create an index on tutorials_tbl using tutorial_author.
CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)
You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.
If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.
mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)
Adding some visual representation to the list of answers.
MySQL uses an extra layer of indirection: secondary index records point to primary index records, and the primary index itself holds the on-disk row locations. If a row offset changes, only the primary index needs to be updated.
Caveat: Disk data structure looks flat in the diagram but actually is a
B+ tree.
Source: link
I want to add my 2 cents. I am far from being a database expert, but I've recently read up a bit on this topic; enough for me to try and give an ELI5. So, here's may layman's explanation.
I understand it as such that an index is like a mini-mirror of your table, pretty much like an associative array. If you feed it with a matching key then you can just jump to that row in one "command".
But if you didn't have that index / array, the query interpreter must use a for-loop to go through all rows and check for a match (the full-table scan).
Having an index has the "downside" of extra storage (for that mini-mirror), in exchange for the "upside" of looking up content faster.
Note that (in dependence of your db engine) creating primary, foreign or unique keys automatically sets up a respective index as well. That same principle is basically why and how those keys work.
Let's suppose you have a book, probably a novel, a thick one with lots of things to read, hence lots of words.
Now, hypothetically, you brought two dictionaries, consisting of only words that are only used, at least one time in the novel. All words in that two dictionaries are stored in typical alphabetical order. In hypothetical dictionary A, words are printed only once while in hypothetical dictionary B words are printed as many numbers of times it is printed in the novel. Remember, words are sorted alphabetically in both the dictionaries.
Now you got stuck at some point while reading a novel and need to find the meaning of that word from anyone of those hypothetical dictionaries. What you will do? Surely you will jump to that word in a few steps to find its meaning, rather look for the meaning of each of the words in the novel, from starting, until you reach that bugging word.
This is how the index works in SQL. Consider Dictionary A as PRIMARY INDEX, Dictionary B as KEY/SECONDARY INDEX, and your desire to get for the meaning of the word as a QUERY/SELECT STATEMENT.
The index will help to fetch the data at a very fast rate. Without an index, you will have to look for the data from the starting, unnecessarily time-consuming costly task.
For more about indexes and types, look this.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.
Indexing adds a data structure with columns for the search conditions and a pointer
The pointer is the address on the memory disk of the row with the
rest of the information
The index data structure is sorted to optimize query efficiency
The query looks for the specific row in the index; the index refers to the pointer which will find the rest of the information.
The index reduces the number of rows the query has to search through from 17 to 4.
Is there an alternative for LIKE. Note I cannot use FULL TEXT Search.
Here is my mysql code.
SELECT *
FROM question
WHERE content LIKE '%$search_each%'
OR title LIKE '%$search_each%'
OR summary LIKE '%$search_each%'
Well, MySQL has regular expressions but I would like to ask you what the problem is with multiple LIKEs.
I know it won't scale well when tables get really large but that's rarely a concern for the people using MySQL (not meaning to be disparaging to MySQL there, it's just that I've noticed a lot of people seem to use it for small databases, leaving large ones to the likes of Oracle, DB2 or SQLServer (or NoSQL where ACID properties aren't so important)).
If, as you say:
I plan to use it for really large sites.
then you should avoid LIKE altogether. And, if you cannot use full text search, you'll need to roll your own solution.
One approach we've used in the past is to use insert/update/delete triggers on the table to populate yet another table. The insert/update trigger should:
evaluate the string in question;
separate it into words;
throw away inconsequential words (all-numerics, noise words like 'at', 'the', 'to', and so on); then
add those words to a table which a marker to the row in the original table.
Then use that table for searching, almost certainly much faster than multiple LIKEs. It's basically a roll-your-own sort-of-full text search where you can fine-tune and control what actually should be indexed.
The advantage of this is speed during the select process with a minor cost during the update process. Keep in mind this is best for tables that are read more often than written (most of them) since it amortises the cost of indexing the individual words across all reads. There's no point in incurring that cost on every read, better to do it only when the data changes.
And, by the way, the delete trigger will simply delete all entries in the indexing table which refer to the real record.
The table structures would be something like:
Comments:
id int
comment varchar(200)
-- others.
primary key (id)
Words:
id int
word varchar(50)
primary key (id)
index (word)
WordsInComments:
wordid int
commentid int
primary key (wordid,commentid)
index (commentid)
Setting the many-to-many relationship to id-id (i.e., separate Words and WordsInComments tables) instead of id-text (combining them into one) is the correct thing to do for third normal form but you may want to look at trading off storage space for speed and combining them, provided you understand the implications.