I've a table with 7 columns, I've on primary on first column, another index (foreign key).
My app does:
SELECT `comment_vote`.`ip`, `comment_vote`.`comment_id`, COUNT(*) AS `nb` FROM `comment_vote`
SELECT `comment_vote`.`type` FROM `comment_vote` WHERE (comment_id = 123) AND (ip = "127.0.0.1")
Is it worth to add an index on ip column? it is often used in my select query.
By the way is there anything I can do to quick up those queries? Sometimes it tooks a long time and lock the table preventing other queries to run.
If you are searching by ip quite often then yes you can create an index. However your insert/updates might take a bit longer due to this. Not sure how your data is structured but if the data collection is by ip then may be you can consider partitioning it by ip.
A good rule of thumb: If a column appears in the WHERE clause, there should be an index for it. If a query is slow, there's a good chance an index could help, particularly one that contains all fields in the WHERE clause.
In MySQL, you can use the EXPLAIN keyword to see an approximate query plan for your query, including indexes used. This should help you find out where your queries spend their time.
Yes, do create an index on ip if you're using it in other queries.
This one uses column id and ip, so I'd create an index on the combination. An index on ip alone won't help that query.
YES! Almost always add an INDEX or two or three! (multi-column indexes?) to every column.
If it is in not a WHERE clause today, you can bet it will be tomorrow.
Most data is WORM (written once read many times) so making the read most effective is where you will get the most value. And, as many have pointed out, the argument about having to maintain the index during a write is just plain silly.
Related
If I have a query like:
Select EmployeeId
From Employee
Where EmployeeTypeId IN (1,2,3)
and I have an index on the EmployeeTypeId field, does SQL server still use that index?
Yeah, that's right. If your Employee table has 10,000 records, and only 5 records have EmployeeTypeId in (1,2,3), then it will most likely use the index to fetch the records. However, if it finds that 9,000 records have the EmployeeTypeId in (1,2,3), then it would most likely just do a table scan to get the corresponding EmployeeIds, as it's faster just to run through the whole table than to go to each branch of the index tree and look at the records individually.
SQL Server does a lot of stuff to try and optimize how the queries run. However, sometimes it doesn't get the right answer. If you know that SQL Server isn't using the index, by looking at the execution plan in query analyzer, you can tell the query engine to use a specific index with the following change to your query.
SELECT EmployeeId FROM Employee WITH (Index(Index_EmployeeTypeId )) WHERE EmployeeTypeId IN (1,2,3)
Assuming the index you have on the EmployeeTypeId field is named Index_EmployeeTypeId.
Usually it would, unless the IN clause covers too much of the table, and then it will do a table scan. Best way to find out in your specific case would be to run it in the query analyzer, and check out the execution plan.
Unless technology has improved in ways I can't imagine of late, the "IN" query shown will produce a result that's effectively the OR-ing of three result sets, one for each of the values in the "IN" list. The IN clause becomes an equality condition for each of the list and will use an index if appropriate. In the case of unique IDs and a large enough table then I'd expect the optimiser to use an index.
If the items in the list were to be non-unique however, and I guess in the example that a "TypeId" is a foreign key, then I'm more interested in the distribution. I'm wondering if the optimiser will check the stats for each value in the list? Say it checks the first value and finds it's in 20% of the rows (of a large enough table to matter). It'll probably table scan. But will the same query plan be used for the other two, even if they're unique?
It's probably moot - something like an Employee table is likely to be small enough that it will stay cached in memory and you probably wouldn't notice a difference between that and indexed retrieval anyway.
And lastly, while I'm preaching, beware the query in the IN clause: it's often a quick way to get something working and (for me at least) can be a good way to express the requirement, but it's almost always better restated as a join. Your optimiser may be smart enough to spot this, but then again it may not. If you don't currently performance-check against production data volumes, do so - in these days of cost-based optimisation you can't be certain of the query plan until you have a full load and representative statistics. If you can't, then be prepared for surprises in production...
So there's the potential for an "IN" clause to run a table scan, but the optimizer will
try and work out the best way to deal with it?
Whether an index is used doesn't so much vary on the type of query as much of the type and distribution of data in the table(s), how up-to-date your table statistics are, and the actual datatype of the column.
The other posters are correct that an index will be used over a table scan if:
The query won't access more than a certain percent of the rows indexed (say ~10% but should vary between DBMS's).
Alternatively, if there are a lot of rows, but relatively few unique values in the column, it also may be faster to do a table scan.
The other variable that might not be that obvious is making sure that the datatypes of the values being compared are the same. In PostgreSQL, I don't think that indexes will be used if you're filtering on a float but your column is made up of ints. There are also some operators that don't support index use (again, in PostgreSQL, the ILIKE operator is like this).
As noted though, always check the query analyser when in doubt and your DBMS's documentation is your friend.
#Mike: Thanks for the detailed analysis. There are definately some interesting points you make there. The example I posted is somewhat trivial but the basis of the question came from using NHibernate.
With NHibernate, you can write a clause like this:
int[] employeeIds = new int[]{1, 5, 23463, 32523};
NHibernateSession.CreateCriteria(typeof(Employee))
.Add(Restrictions.InG("EmployeeId",employeeIds))
NHibernate then generates a query which looks like
select * from employee where employeeid in (1, 5, 23463, 32523)
So as you and others have pointed out, it looks like there are going to be times where an index will be used or a table scan will happen, but you can't really determine that until runtime.
Select EmployeeId From Employee USE(INDEX(EmployeeTypeId))
This query will search using the index you have created. It works for me. Please do a try..
First of all, I am still very new to PHP / mySQL so excuse my question if it is too simple for you :)
Here is my issue: I am currently working on storing a lot of data into a mysql database. It's basically a directory like archive for my son's school.
The structure is basically:
id, keyword, title, description, url, rank, hash
id is int 11
keyword is varchar 255
title is varchar 255
description is text
url is varchar 255
rank is int 2
hash is varchar 50
We plan to insert about 10 million rows containing the fields above and my mission is being able to query the database as fast as possible.
My query is always for an exact keyword.
For example:
select * from table where keyword = "keyword" limit 10
I really just need to query the keyword and not the title or description or anything else. There are a maximum of 10 results for each keyword - never more.
I have no real clue about mysql indexes and stuff but I read that it can improve speed if you have some indexes.
Now here is where I need help from a Pro. My mission is being able to run the fastest possible query, so it doesn't take too long to query the database. Since I am only looking up the keyword field, I am sure there is a way to make sure that even if you have millions of rows, that the results can be returned quickly.
What would you suggest that I should do. Should I set the keyword field to INDEX or do I have to watch anything else? Since I have no real clue about INDEXES, your help is appreciated, meaning I don't know if I should use indexes at all, or if I have to use them for everything like keyword, title, description and so on...
The database is updated frequently - in case it matters.
Do you think it's even possible to store millions of rows and doing a query in less than a second?
Any other suggestions such as custom my.cnf settings etc would be also helpful.
Your help is greatly appreciated.
Your intuition is correct - if you are only filtering on keyword in your WHERE clause, it should be indexed and you likely will see some execution speed improvement if you do.
CREATE INDEX `idx_keyword` ON `yourtable` (`keyword`)
You may be using a client like PHPMyAdmin which makes index creation easier than execution commands, but review the MySQL documentation on CREATE INDEX. Yours is a very run-of-the-mill case, so you won't need any special options.
Although this isn't the case for you (as you said there would be up to 10 rows per keyword), if you already had a unique constraint or PRIMARY KEY or FOREIGN KEY defined on keyword, it would function as an index as well.
Add an index on the keyword column. It will increase the speed significantly. Then it should be no problem to query the data in milliseconds.
In generel you should put an index on fields you are using in your where clause. That way the DB can limit the data really fast and return the results.
I know I need to have a primary key set, and to set anything that should be unique as a unique key, but what is an INDEX and how do I use them?
What are the benefits? Pros & Cons? I notice I can either use them or not, when should I?
Short answer:
Indexes speed up SELECT's and slow down INSERT's.
Usually it's better to have indexes, because they speed up select more than they slow down insert.
On an UPDATE the index can speed things way up if an indexed field is used in the WHERE clause and slow things down if you update one of the indexed fields.
How do you know when to use an index
Add EXPLAIN in front of your SELECT statement.
Like so:
EXPLAIN SELECT * FROM table1
WHERE unindexfield1 > unindexedfield2
ORDER BY unindexedfield3
Will show you how much work MySQL will have to do on each of the unindexed fields.
Using that info you can decide if it is worthwhile to add indexes or not.
Explain can also tell you if it is better to drop and index
EXPLAIN SELECT * FROM table1
WHERE indexedfield1 > indexedfield2
ORDER BY indexedfield3
If very little rows are selected, or MySQL decided to ignore the index (it does that from time to time) then you might as well drop the index, because it is slowing down your inserts but not speeding up your select's.
Then again it might also be that your select statement is not clever enough.
(Sorry for the complexity in the answer, I was trying to keep it simple, but failed).
Link:
MySQL indexes - what are the best practices?
Pros:
Faster lookup for results. This is all about reducing the # of Disk IO's. Instead of scanning the entire table for the results, you can reduce the number of disk IO's(page fetches) by using index structures such as B-Trees or Hash Indexes to get to your data faster.
Cons:
Slower writes(potentially). Not only do you have to write your data to your tables, but you also have to write to your indexes. This may cause the system to restructure the index structure(Hash Index, B-Tree etc), which can be very computationally expensive.
Takes up more disk space, naturally. You are storing more data.
The easiest way to think about an index is to think about a dictionary. It has words and it has definitions corresponding to those words. The dictionary has an index on "word" because when you go to a dictionary you want to look up a word quickly, then get its definition. A dictionary usually contains just one index - an index by word.
A database is analogous. When you have a bunch of data in the database, you will have certain ways that you want to get it out. Let's say you have a User table and you often look up a user by the FirstName column. Since this is an operation that you are doing often in your application, you should consider using an index on this column. That will create a structure in the database that is sorted, if you will, by that column, so that looking up something by first name is like looking up a word in a dictionary. If you didn't have this index you might need to look at ALL rows before you determine which ones have a specific FirstName. By adding an index, you have made this fast.
So why not put an index on all columns and make them all fast? Like everything, there is a trade off. Every time you insert a row into the table User, the database will need to perform its magic and sort everything on your indexed column. This can be expensive.
You don't have to have a primary key. Indexes (of any type) are used to speed up queries and, at least with the InnoDB engine, enforce foreign key constraints. Whether you use a unique or plain (non-unique) index depends on whether you want to allow duplicate values in the key.
This is a general database concept, you might use external resources to read about it, like http://beginner-sql-tutorial.com/sql-index.htm or http://en.wikipedia.org/wiki/Index_(database)
An index allows MySQL to find data quicker. You use them on columns that you'll be using in WHERE clauses. For example, if you have a column named score, and want to find everything with where score > 5, by default this means MySQL will need to scan through the WHOLE table to find those scores. However if you use a BTREE index, finding those that meet that condition will happen a LOT faster.
Indices have a price: disk and memory space. If it's a very big table, your index will grow rather large.
Think of it this way: what are the biggest benefits of having an index in a book? It's much the same thing. You have a slightly larger book, yet you're able to quickly look things up. When you create an index on a column, you're saying you want to be able to reference it in a where clause to look it up quickly.
i have a table with about 200,000 records.
it takes a long time to do a simple select query. i am confiused because i am running under a 4 core cpu and 4GB of ram.
how should i write my query?
or is there anything to do with INDEXING?
important note: my table is static (it's data wont change).
what's your solutions?
PS
1 - my table has a primary key id
2 - my table has a unique key serial
3 - i want to query over the other fields like where param_12 not like '%I.S%'
or where param_13 = '1'
4 - 200,000 is not big and this is exactly why i am surprised.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? (or is it usefull)
PS and thanks for answers
7 - my select shoudl return the fields that has specified 'I.S' or has not.
select * from `table` where `param_12` like '%I.S%'
this is all i want. it seems no Index helps here. ham?
Indexing will help. Please post table definition and select query.
Add index for all "=" columns in where clause.
Yes, you'll want/need to index this table and partitioning would be helpful as well. Doing this properly is something you will need to provide more information for. You'll want to use EXPLAIN PLAN and look over your queries to determine which columns and how you should index them.
Another aspect to consider is whether or not your table normalized. Normalized tables tend to give better performance due to lowered I/O.
I realize this is vague, but without more information that's about as specific as we can be.
BTW: a table of 200,000 rows is relatively small.
Here is another SO question you may find useful
1 - my table has a primary key id: Not really usefull unless you use some scheme which requires a numeric primary key
2 - my table has a unique key serial: The id is also unique by definition; why not use serial as the primary? This one is automatically indexed because you defined it as unique.
3 - i want to query over the other fields like where param_12 not like '%I.S%' or where param_13 = '1': A like '%something%' query can not really use an index; is there some way you can change param12 to param 12a which is the first %, and param12b which is 'I.S%'? An index can be used on a like statement if the starting string is known.
4 - 200,000 is not big and this is exactly why i am surprised: yep, 200.000 is not that much. But without good indexes, queries and/or cache size MySQL will need to read all data from disk for comparison, which is slow.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? Yes you can, but an index which matches half of the time is fairly useless, an index is used to limit the amount of records MySQL has to load fully as much as possible; if an index does not dramatically limit that number, as is often the case with boolean (in a 50-50 distribution), using an index only requires more disk IO and can slow searching down. So unless you expect something like an 80-20 distribution or better creating an index will cost time, and not win time.
Index on param_13 might be used, but not the one on param_12 in this example, since the use of LIKE '% negate the index use.
If you're querying data with LIKE '%asdasdasd%' then no index can help you. It will have to do a full scan every time. The problem here is the leading % because that means that the substring you are looking for can be anywhere in the field - so it has to check it all.
Possibly you might look into full-text indexing, but depending on your needs that might not be appropriate.
Firstly, ensure your table have a primary key.
To answer in any more detail than that you'll need to provide more information about the structure of the table and the types of queries you are running.
I don't believe that the keys you have will help. You have to index on the columns used in WHERE clauses.
I'd also wonder if the LIKE requires table scans regardless of indexes. The minute you use a function like that you lose the value of the index, because you have to check each and every row.
You're right: 200K isn't a huge table. EXPLAIN PLAN will help here. If you see TABLE SCAN, redesign.
I am dealing with MySQL tables that are essentially results of raytracing simulations on a simulated office room with a single venetian blind. I usually need to retrieve the simulation's result for a unique combination of time and blind's settings. So I end up doing a lot of
SELECT result FROM results WHERE timestamp='2005-05-05 12:30:25' \
AND opening=40 AND slatangle=60
This looks suspiciously optimizable, since this query should never ever return more than one row. Does it make sense to define an index on the three columns that uniquely identify each row? Or are there other techniques I can use?
The answer is most definately a yes. If you define a unique index on timestamp, opening and slatangle MySQL should be able to find your row with very few disc seeks.
You might experiment with creating an index on timestamp, opening, slateangle and result. MySQL may be able to fetch your data from the index without touching the datafile at all.
The MySQL Manual has a section about optimzing queries.
I would suggest adding
LIMIT 1;
to the end of the query.
William
I wouldn't suggest adding 3 indexes. An index using all three columns may be better and even setting the primary key unique on that combination would be best - only if you're sure that it unique.
Yes, create an index of multiple columns helps. Also you should test the performance of different column order, ie O(c1, c2, c3) != O(c2, c1, c3)
Have a look
http://joekuan.wordpress.com/2009/01/23/mysql-optimize-your-query-to-be-more-scalable-part-12/
http://joekuan.wordpress.com/2009/01/23/mysql-optimize-your-query-to-be-more-scalable-part-22/