what is mysql indexing and how do you create an index? - mysql

Okay, mysql indexing. Is indexing nothing more than having a unique ID for each row that will be used in the WHERE clause?
When indexing a table does the process add any information to the table? For instance, another column or value somewhere.
Does indexing happen on the fly when retrieving values or are values placed into the table much like an insert or update function?
Any more information to clearly explain mysql indexing would be appreciated. And please dont just place a link to the mysql documentation, it is confusing and it is always better to get a personal response from a professional.
Lastly, why is indexing different from telling mysql to look for values between two values. For Example: WHERE create_time >= 'AweekAgo'
I'm asking because one of my tables is 220,000+ rows and it takes more than a minute to return values with a very simple mysql select statement and I'm hoping indexing will speed this up.
Thanks in advanced.

You were down voted because you didn't make effort to read or search for what you are asking for. A simple search in google could have shown you the benefits and drawbacks of Database Index. Here is a related question on StackOverflow. I am sure there are numerous questions like that.
To simplify the jargons, it would be easier to locate books in a library if you arrange the in shelves numbered according to their area of specialization. You can easily tell somebody to go to a specific location and pick the book - that is what index does
Another example: imagine an alphabetically ordered admission list. If your name start with Z, you will just skip A to Y and get to Z - faster? If otherwise, you will have to search and search and may not even find it if you didn't look carefully

A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.
You can create an index like this way :
CREATE INDEX index_name
ON table_name ( column1, column2,...);
You might be working on a more complex database, so it's good to remember a few simple rules.
Indexes slow down inserts and updates, so you want to use them carefully on columns that are FREQUENTLY updated.
Indexes speed up where clauses and order by.
For further detail, you can read :
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://www.tutorialspoint.com/mysql/mysql-indexes.htm

There are a lot of indexing, for example a hash, a trie, a spatial index. It depends on the value. Most likely it's a hash and a binary search tree. Nothing really fancy because most likely the fancy thing is expensive.

Related

Can I use index in MySQL in this way? [duplicate]

If I have a query like:
Select EmployeeId
From Employee
Where EmployeeTypeId IN (1,2,3)
and I have an index on the EmployeeTypeId field, does SQL server still use that index?
Yeah, that's right. If your Employee table has 10,000 records, and only 5 records have EmployeeTypeId in (1,2,3), then it will most likely use the index to fetch the records. However, if it finds that 9,000 records have the EmployeeTypeId in (1,2,3), then it would most likely just do a table scan to get the corresponding EmployeeIds, as it's faster just to run through the whole table than to go to each branch of the index tree and look at the records individually.
SQL Server does a lot of stuff to try and optimize how the queries run. However, sometimes it doesn't get the right answer. If you know that SQL Server isn't using the index, by looking at the execution plan in query analyzer, you can tell the query engine to use a specific index with the following change to your query.
SELECT EmployeeId FROM Employee WITH (Index(Index_EmployeeTypeId )) WHERE EmployeeTypeId IN (1,2,3)
Assuming the index you have on the EmployeeTypeId field is named Index_EmployeeTypeId.
Usually it would, unless the IN clause covers too much of the table, and then it will do a table scan. Best way to find out in your specific case would be to run it in the query analyzer, and check out the execution plan.
Unless technology has improved in ways I can't imagine of late, the "IN" query shown will produce a result that's effectively the OR-ing of three result sets, one for each of the values in the "IN" list. The IN clause becomes an equality condition for each of the list and will use an index if appropriate. In the case of unique IDs and a large enough table then I'd expect the optimiser to use an index.
If the items in the list were to be non-unique however, and I guess in the example that a "TypeId" is a foreign key, then I'm more interested in the distribution. I'm wondering if the optimiser will check the stats for each value in the list? Say it checks the first value and finds it's in 20% of the rows (of a large enough table to matter). It'll probably table scan. But will the same query plan be used for the other two, even if they're unique?
It's probably moot - something like an Employee table is likely to be small enough that it will stay cached in memory and you probably wouldn't notice a difference between that and indexed retrieval anyway.
And lastly, while I'm preaching, beware the query in the IN clause: it's often a quick way to get something working and (for me at least) can be a good way to express the requirement, but it's almost always better restated as a join. Your optimiser may be smart enough to spot this, but then again it may not. If you don't currently performance-check against production data volumes, do so - in these days of cost-based optimisation you can't be certain of the query plan until you have a full load and representative statistics. If you can't, then be prepared for surprises in production...
So there's the potential for an "IN" clause to run a table scan, but the optimizer will
try and work out the best way to deal with it?
Whether an index is used doesn't so much vary on the type of query as much of the type and distribution of data in the table(s), how up-to-date your table statistics are, and the actual datatype of the column.
The other posters are correct that an index will be used over a table scan if:
The query won't access more than a certain percent of the rows indexed (say ~10% but should vary between DBMS's).
Alternatively, if there are a lot of rows, but relatively few unique values in the column, it also may be faster to do a table scan.
The other variable that might not be that obvious is making sure that the datatypes of the values being compared are the same. In PostgreSQL, I don't think that indexes will be used if you're filtering on a float but your column is made up of ints. There are also some operators that don't support index use (again, in PostgreSQL, the ILIKE operator is like this).
As noted though, always check the query analyser when in doubt and your DBMS's documentation is your friend.
#Mike: Thanks for the detailed analysis. There are definately some interesting points you make there. The example I posted is somewhat trivial but the basis of the question came from using NHibernate.
With NHibernate, you can write a clause like this:
int[] employeeIds = new int[]{1, 5, 23463, 32523};
NHibernateSession.CreateCriteria(typeof(Employee))
.Add(Restrictions.InG("EmployeeId",employeeIds))
NHibernate then generates a query which looks like
select * from employee where employeeid in (1, 5, 23463, 32523)
So as you and others have pointed out, it looks like there are going to be times where an index will be used or a table scan will happen, but you can't really determine that until runtime.
Select EmployeeId From Employee USE(INDEX(EmployeeTypeId))
This query will search using the index you have created. It works for me. Please do a try..

How does optimize command change the explain

I would like to ask a question about the principle of index and optimization in database.
I am using mysql. The schema engine is myisam. In one query, the explain results showed 8000+ rows in a table that had been well indexed. Then my colleague used the command 'optimize table' in this table. And after that the explain showed 2 rows which looked correct. The result is good, but both of us do not really understand what really happened and why.
I am new in this area. So can anyone help to explain how this 'explain' and the index can be significantly changed after optimization? I thought index should be good enough before we optimize the table.
Many thanks!
You can read the manual on OPTIMIZE TABLE here: https://dev.mysql.com/doc/refman/5.7/en/optimize-table.html
For MyISAM tables, OPTIMIZE TABLE works as follows:
If the table has deleted or split rows, repair the table.
If the index pages are not sorted, sort them.
If the table's statistics are not up to date (and the repair could
not be accomplished by sorting the index), update them.
It's the last step that is most useful in your case. This is the same work that is performed by ANALYZE TABLE. Read more about what that does here: https://dev.mysql.com/doc/refman/5.7/en/analyze-table.html
Both OPTIMIZE TABLE and ANALYZE TABLE do completely different things when using InnoDB. Read the docs to learn more.
It's all about the "distribution of data" in indexes. as time passes and records are added, one index might become better suited than another. You obviously need an example:
Let's say you have a table with last_name and city field and an index for each. If you have a search with BOTH fields, like WHERE last_name='jones' and city='here' then any of the indexes might be used, they are both equal. Once one is chosen, then a slow search is done for the second field.
Now with time, city might start to show a lot less variability than name. So a search on both might indicate that city will yield too many records to filter as a second pass, where as last_name might be a smaller set , so faster.
Optimize will detect this distribution and hint to use last_name in preference to city with more data and time.
Hope this was clear ...

MySQL search FTS vs Multiple Queries

Working on a project where schema is something like this:
id , key, value
The key and value columns are varchar, and the table is InnoDB.
A user can search on the basis of key value pairs ... Whats the best way to query in MySQL ? the options I can think of is:
For each key => value form a query and perform an inner join to get id matching all criterias.
Or in the background, populate a MyISAM table id, info with Full Text index on info and a single query using like '%key:value%key2:value2%'. The benefit of this will be later on if the website is popular and the table has a hundred thousand rows, I can easily port the code to Lucene but for now MySQL.
The pattern you're talking about is called relational division.
Option #1 (the self-join) is a much faster solution if you have the right indexes.
I compared the performance for a couple of solutions to relational division in my presentation
SQL Query Patterns, Optimized. The self-join solution worked in 0.005 seconds even against a table with millions of rows.
Option #2 with fulltext isn't correct anyway as you've written it, because you wouldn't use LIKE with fulltext search. You'd use MATCH(info) AGAINST('...' IN BOOLEAN MODE). I'm not sure you can use patterns in key:value format anyway. MySQL FTS prefers to match words.
#Bill Karwin
If you're going to do this for 1 condition, it will be super fast with this EAV-like schema, but if you do it for many (esp. with mixed ANDs and ORs) it will probably fall apart. The best you can hope for is some sort of super fast index merge, and that's elusive. You're going to get a temporary table in most DBMSes if you do anything fancy. I think I remember reading you're no fan of EAV, though, and maybe I'm misunderstanding you.
As I recall, a DBMS is also free to do multiple scans and then handle this with a disposable bitmap index. But fulltext indexes keep the document lists sorted and do a low-cost merge across all criteria with a FTS planner that starts strategically with the rarer keywords. That's all they do to execute "word1 & word2" all day. They're optimized for this sort of thing.
So if you have lots of simple facts, a FTS index is one decent way to do it I think. Am I missing something? You just need to change the facts to something indexable like COLORID_3, then search for "COLORID_3 & SOMETHINGELSEID_5."
If the queries involve no merging or sorting, I suspect it will be pretty much as wash. Nothing here but us BTREEs ...

How does MySQL find rows with a given content?

I am wondering how MySQL finds the rows in a table when searching like so:
select * from table where field = 'text';
Does it use a particular search algorithm? Is it practically the fastest way to look up information in a table? Or would building a search macro using another algorithm (like Boyer-Moore) work faster?
If there is an index on field, then databases often use a b-tree for indexed searches. If there is no index, then the entire table is scanned. This describes some of the techniques used in MySql
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
Many hours of work has gone into optimizing MySql. Take advantage of that work already done, and resist trying to re-doing it
For that query it can do nothing other than searching every entry of that table and comparing its field column against that string.
Boyer-Moore isn't needed because it's exact equality that's requested and not asking whether the field contains that string.
If you are interested in how it found those records try executing using the EXPLAIN keyword:
EXPLAIN select * from table where field = 'text';
I would recommend looking at this article to get a better understanding what is happening in the background.
I would be very surprised if you would be able to write something on your own that is faster. You could look at creating indexes on the table in question to speed up selects.

What are the biggest benefits of using INDEXES in mysql?

I know I need to have a primary key set, and to set anything that should be unique as a unique key, but what is an INDEX and how do I use them?
What are the benefits? Pros & Cons? I notice I can either use them or not, when should I?
Short answer:
Indexes speed up SELECT's and slow down INSERT's.
Usually it's better to have indexes, because they speed up select more than they slow down insert.
On an UPDATE the index can speed things way up if an indexed field is used in the WHERE clause and slow things down if you update one of the indexed fields.
How do you know when to use an index
Add EXPLAIN in front of your SELECT statement.
Like so:
EXPLAIN SELECT * FROM table1
WHERE unindexfield1 > unindexedfield2
ORDER BY unindexedfield3
Will show you how much work MySQL will have to do on each of the unindexed fields.
Using that info you can decide if it is worthwhile to add indexes or not.
Explain can also tell you if it is better to drop and index
EXPLAIN SELECT * FROM table1
WHERE indexedfield1 > indexedfield2
ORDER BY indexedfield3
If very little rows are selected, or MySQL decided to ignore the index (it does that from time to time) then you might as well drop the index, because it is slowing down your inserts but not speeding up your select's.
Then again it might also be that your select statement is not clever enough.
(Sorry for the complexity in the answer, I was trying to keep it simple, but failed).
Link:
MySQL indexes - what are the best practices?
Pros:
Faster lookup for results. This is all about reducing the # of Disk IO's. Instead of scanning the entire table for the results, you can reduce the number of disk IO's(page fetches) by using index structures such as B-Trees or Hash Indexes to get to your data faster.
Cons:
Slower writes(potentially). Not only do you have to write your data to your tables, but you also have to write to your indexes. This may cause the system to restructure the index structure(Hash Index, B-Tree etc), which can be very computationally expensive.
Takes up more disk space, naturally. You are storing more data.
The easiest way to think about an index is to think about a dictionary. It has words and it has definitions corresponding to those words. The dictionary has an index on "word" because when you go to a dictionary you want to look up a word quickly, then get its definition. A dictionary usually contains just one index - an index by word.
A database is analogous. When you have a bunch of data in the database, you will have certain ways that you want to get it out. Let's say you have a User table and you often look up a user by the FirstName column. Since this is an operation that you are doing often in your application, you should consider using an index on this column. That will create a structure in the database that is sorted, if you will, by that column, so that looking up something by first name is like looking up a word in a dictionary. If you didn't have this index you might need to look at ALL rows before you determine which ones have a specific FirstName. By adding an index, you have made this fast.
So why not put an index on all columns and make them all fast? Like everything, there is a trade off. Every time you insert a row into the table User, the database will need to perform its magic and sort everything on your indexed column. This can be expensive.
You don't have to have a primary key. Indexes (of any type) are used to speed up queries and, at least with the InnoDB engine, enforce foreign key constraints. Whether you use a unique or plain (non-unique) index depends on whether you want to allow duplicate values in the key.
This is a general database concept, you might use external resources to read about it, like http://beginner-sql-tutorial.com/sql-index.htm or http://en.wikipedia.org/wiki/Index_(database)
An index allows MySQL to find data quicker. You use them on columns that you'll be using in WHERE clauses. For example, if you have a column named score, and want to find everything with where score > 5, by default this means MySQL will need to scan through the WHOLE table to find those scores. However if you use a BTREE index, finding those that meet that condition will happen a LOT faster.
Indices have a price: disk and memory space. If it's a very big table, your index will grow rather large.
Think of it this way: what are the biggest benefits of having an index in a book? It's much the same thing. You have a slightly larger book, yet you're able to quickly look things up. When you create an index on a column, you're saying you want to be able to reference it in a where clause to look it up quickly.