I have a large database containing more than five million records, this database has three fields (ID, name, text), the field ID has a primary key, the field name has a FULLTEXT index.
I want to create a search engine for my site that seeks in the field name, I use FULLTEXT index but has the disadvantage not to accept the keywords of less than four characters, so I decided to delete it and put a INDEX KEY on the field name and use the following request:
EXPLAIN SELECT * FROM table WHERE locate ('search', name) > 0;
the problem is that this application does not use the index KEY field name,
but this request:
EXPLAIN SELECT name FROM table WHERE locate ('search', name) > 0;
uses the INDEX KEY,
I do not know why when I select all fields MYSQL does not use index.
In your opinion how to solve this problem and if possible a better alternative.
You can set the minimum amount of characters for full text indexes in the mysql configuration. I am not at my computer at the moment to find a example however this page might help you: http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html
Update:
Back at my pc. So regarding why mysql would use a index on the SELECT name FROM table WHERE locate ('search', name) > 0; statement is very simple. When you create a index on the name field the index contains the actual name field i.e. the value of the name field, so when you select only the name field mysql can do a search and retrieve all the data required from the index. So in this scenario mysql has to do one operation to retrieve the data which match the searched values in the index and return them.
The SELECT name FROM table WHERE locate ('search', name) > 0; however needs the other data fields as well. Since only the name field's value is stored in the index mysql will have to read the index and then the table to retrieve the other fields. So in this scenario mysql has to match the values in the index then find the values on the table and then return them. This means mysql has to do 2 operations which is double the amount of work compared to the previous scenario.
Since 5 million rows is still very small it is probably faster for mysql to just loop through the table and retrieve the rows. As you add more rows mysql will probably start using the index once the cost of looping through the table is higher than the cost of reading the index and then looking up the values on the table.
Hope that makes sense.
Related
I would like to know if it is necessary to create an index for all fields within a table if one of your queries will use SELECT *.
To explain, if we had a table that 10M records and we did a SELECT * query on it would the query run faster if we have created an index for all fields within the table or does MySQL handle SELECT * in a different way to SELECT first_field, a_field, last_field.
To my understanding, if I had a query that did SELECT first_field, a_field FROM table then it would bring performance benefits if we created an index on first_field, a_field but if we use SELECT * is there even a benefit from creating an index for all fields?
Performing a SELECT * FROM mytable query would have to read all the data from the table. This could, theoretically, be done from an index if you have an index on all the columns, but it would be just faster for the database to read the table itself.
If you have a where clause, having an index on (some of) the columns you have conditions on may dramatically improve the query's performance. It's a gross simplification, but what basically happens is the following:
The appropriate rows are filtered according to the where clause. It's much faster to search for these rows in an index (which is, essentially, a sorted tree) than a table (which is an unordered set of rows).
For the columns that where in the index used in the previous step the values are returned.
For the columns that aren't, the table is accessed (according to a pointer kept in the index).
indexing a mysql table for a column improves performance when there is a need to search or edit a row/record based on that column of that table.
for example, if there is an 'id' column and if it is a primary key; And in that case if you want to search a record using where clause on that 'id' column then you don't need to create index for the 'id' column because primary key column will act as an indexed column.
In another case, if there is an 'pid' column in the table and if it is not a primary key; Then in order to search based on 'pid' column then to improve performance it is better to create an index for the 'pid' column. That will make query fast to search the expected record.
My Table Schema is
CREATE TABLE ITEMS(Time , Name, Token) PRIMARY_KEY(Time, NAME).
Where Time is the timestamp the item is created. When i do the following query
SELECT Name, Token from ITEMS where name = shoes
it takes a while to load the data as my table has more than million rows.
Should i need to add INDEX for faster retrieval of data? I already have an INDEX for this table as there is a PRIMARY KEY.
You need a separate index for name. The primary key index can handle name, but only in conjunction with time.
If you defined it instead as:
PRIMARY_KEY(Name, Time)
Then your query could take advantage of the index.
MySQL has pretty good documentation on composite indexes here.
When you create index using PRIMARY_KEY(Time, NAME), these values will be concatenated. There is no way for MySQL to use this index to search by NAME.
BTW, you may get lot of useful hints from query optimiser if you use EXPLAN keyword in front of your query like this:
EXPLAIN SELECT Name, Token from ITEMS where name = shoes
Keep your eye on output marked "where". This tells how many records MySQL needs to fetch and examine manually after all indexes are exhausted. No need to wait or test in blind.
I'm still trying to get my head around the best way to use INDEXES in MySQL. How do you know when to merge them together and when to have them separate?
Below are the indexes from the Wordpress posts table. See how post_name, post_parent and post_author are seperate entries? And then they have type_status_date which is a mixture of 4 fields?
http://img215.imageshack.us/img215/5976/screenshot20120426at431.png
I don't understand the logic behind this? Can anyone enlighten me?
Going to be a bit of a long answer but here we go. Please note I am not going to deal with the differences in database engines here(MyISAM and InnoDB have distinct way of implementing what I am trying to describe)
First thing you have to understand about a index is that it is a separate data structure stored on disk. Normally this is a b-tree data structure containing the column(s) that you have indexed and also contain a pointer to the row in the table(this pointer is normally the primary key).
The only index that is stored with the data is the primary key index. Thus a primary key index IS the table.
Lets assume you have following table definition.
CREATE TABLE `Student` (
`StudentNumber` INT NOT NULL ,
`Name` VARCHAR(32) NULL ,
`Surname` VARCHAR(32) NULL ,
`StudentEmail` VARCHAR(32) NULL ,
PRIMARY KEY (`StudentNumber`) );
Since we have a primary key on StudentID there will be a index containing the primary key and the other columns in the index. If you had to look at the data in the index you would probably see something like this.
1 , John ,Doe ,Jdoe#gmail.com
As you can see this is the table data once again showing you that the primary key index IS the table.
The StudentNumber column is indexed which allows your to effectively search on it the rest of the data is stored with the key. Thus if ran the following query:
SELECT * FROM Student WHERE StudentNumber=1
MySQL would use the primary index to quickly find the row and the read the data stored with the indexed column. Since there is a index MySQL can use the index to do a effective binary seek operation on the b-tree.
Also when it comes to retrieving the data after doing the search MySQL can read the data from the index thus we are using 1 operation in the index to retrieve the data. Now if I ran the following query:
SELECT * FROM Student WHERE Name ='Joe'
MySQL would check if there is a index that it could use to speed the query up. However in my case there is no index on name so MySQL would do a sequential read from the table one row at a time from the first row to the last.
At each row it would evaluate the row against the where clause and return matching row. So basically it reads the primary key index from top to bottom. Remember the primary key index is the table.
If I ran the following statement:
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_name` (`Name` ASC) ;
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_surname` (`Surname` ASC) ;
MySQL would create new indexes on the Student table. This will be stored away from the table on disk and the data inside would look something like this:
Data in ix_Name
John, 1 <--PRIMARY KEY VALUE
Data in ix_Surname
Doe, 1 <--PRIMARY KEY VALUE
Notice the data in the ix_Name index is the name and the primary key value. Great so if I ran the previous select statement MySQL would then read the ix_name index and get the primary key value for matching items and then use the primary key index to get the rest of the data.
So the number of operations to get the data from the index is 2. The matching rows are found in the index and then a lookup happens on the primary key to get the row data out.
You now have the following query:
SELECT * FROM Student WHERE Name='John' AND surname ='Doe'
Here MySQL cant use both indexes as it would be a waste of operations. If MySQL had to use both indexes in this query the following would happen(this should not happen).
1 Find in the ix_Name the rows with the value John
2 Read the primary key that matches to get the row data
3 Store the matching results
4 Find in the ix Surname the rows with the value Doe
5 Read the primary key that matches to get row data.
6 Store the matching results
7 Take the Name results and Surname results and merge them
8 Return query results.
This is really a waste of IO as MySQL would then read the table twice. Basically using one index would be better than trying to use two(I will explain in a momnet why). MySQL will choose 1 index to use in a this simple query.
So how does MySQL decide on which index to use?
MySQL keeps statistics around indexes internally. These statistics tell MySQL basically how unique a index is. So for the sake of argument lets say the surname index (ix_surname)was more unique than the name index(ix_name) MySQL would use the surname index (ix_surname).
Thus query retrieval would be like this:
1 Use the ix_surname and find rows that match the value Doe
2 Read the primary key and apply the filter for the value John on the actual column data in the row.
3 Return the matched row.
As you can see the number of operations in this search is much less. I have over simplified a lot of the technical detail. Indexing is a interesting thing to master but you have to look at it from the perspective of how do I get the data with the minimal amount of IO.
Hope it is as clear as mud now!
MySQL cannot normally use more than one index at a time. That means, for instance, that when you have a query that filters or sorts on two fields you put them both into the same index.
WordPress likely has a common query that filters and/or sorts on post_type, post_status and post_date. Making an educated guess as to what they stand for, this would likely be the core query for WordPress's Post listing pages. So the three fields are put into the same index.
Assume I have this table:
create table table_a (
id int,
name varchar(25),
address varchar(25),
primary key (id)
) engine = innodb;
When I run this query:
select * from table_a where id >= 'x' and name = 'test';
How will MySQL process it? Will it pull all the id's first (assume 1000 rows) then apply the where clause name = 'test'?
Or while it looks for the ids, it is already applying the where clause at the same time?
As id is the PK (and no index on name) it will load all rows that satisfy the id based criterion into memory after which it will filter the resultset by the name criterion. Adding a composite index containing both fields would mean that it would only load the records that satisfy both criteria. Adding a separate single column index on the name field may not result in an index merge operation, in which case the index would have no effect.
Do you have indexes on either column? That may affect the execution plan. The other thing is one might cast the 'x'::int to ensure a numeric comparison instead of a string comparison.
For the best result, you should have a single index which includes both of the columns id and name.
In your case, I can't answer the affect of the primary index to that query. That depends on DBMS's and versions. If you really don't want to put more index (because more index means slow write and updates) just populate your table with like 10.000.000 random results, try it and see the effect.
you can compare the execution times by executing the query first when the id comes first in the where clause and then interchange and bring the name first. to see an example of mysql performance with indexes check this out http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
You can get information on how the query is processed by running EXPLAIN on the query.
If the idea is to optimize that query then you might want to add an index like:
alter table table_a add unique index name_id_idx (name, id);
I've got a table in a MySQL database that has the following fields:
ID | GENDER | BIRTHYEAR | POSTCODE
Users can search the table using any of the fields in any combination (i.e., SELECT * FROM table WHERE GENDER = 'M' AND POSTCODE IN (1000, 2000); or SELECT * FROM table WHERE BIRTHYEAR = 1973;)
From the MySQL docs, it uses left indexing. So if I create an index on all 4 columns it won't use the index if the ID field isn't used. Do I need to create an index for every possible combination of field (ID; ID/GENDER; ID/BIRTHYEAR; etc.) or will creating one index for all fields be sufficient?
If it makes any difference, there are upwards of 3 million records in this table.
In this situation I typically log search criteria, number of results returned and time taken to perform the search. Just because you're creating the flexibility to search by any field doesn't mean your users make use of this flexibility. I'd normally create indexes on sensible combinations and then once I've determined the usage patterns drop the lowly used indexes or create new unsuspected indexes.
I'm not sure if MySQL supports statistics or histograms for skewed data but the index on gender may or may not work. If MySQL supports statistics then this will indicate the selectivity of an index. In a general population an index on a field with a 50/50 split won't help. If you're sample data is computer programmers and the data is 95% males then a search for females would use the index.
Use EXPLAIN.
(I'd say, use Postgres, too, lol).
It seems recent versions of MySQL can use several indexes in the same query, they call this Index Merge. In this case 1 index per column will be enough.
Gender is a special case, since selectivity is 50% you don't need an index on it, it would be counterproductive.
Creating indexes on single fields is useful but it would be really useful if your data was of varchar type and each record had a different value, since birthyear and postcode are numbers they are already well indexed.
You can index birthyear because it should be different for many of the records (but up to 120 birthyears in total at max I guess).
Gender in my opinion doesn't need an index.
You can find out what field combinations are most likely to give different results and index those, like: birthyear - postcode, id - birthyear, id - postcode.