MySQL: a huge table. can't query, even a simple select! - mysql

i have a table with about 200,000 records.
it takes a long time to do a simple select query. i am confiused because i am running under a 4 core cpu and 4GB of ram.
how should i write my query?
or is there anything to do with INDEXING?
important note: my table is static (it's data wont change).
what's your solutions?
PS
1 - my table has a primary key id
2 - my table has a unique key serial
3 - i want to query over the other fields like where param_12 not like '%I.S%'
or where param_13 = '1'
4 - 200,000 is not big and this is exactly why i am surprised.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? (or is it usefull)
PS and thanks for answers
7 - my select shoudl return the fields that has specified 'I.S' or has not.
select * from `table` where `param_12` like '%I.S%'
this is all i want. it seems no Index helps here. ham?

Indexing will help. Please post table definition and select query.
Add index for all "=" columns in where clause.

Yes, you'll want/need to index this table and partitioning would be helpful as well. Doing this properly is something you will need to provide more information for. You'll want to use EXPLAIN PLAN and look over your queries to determine which columns and how you should index them.
Another aspect to consider is whether or not your table normalized. Normalized tables tend to give better performance due to lowered I/O.
I realize this is vague, but without more information that's about as specific as we can be.
BTW: a table of 200,000 rows is relatively small.
Here is another SO question you may find useful

1 - my table has a primary key id: Not really usefull unless you use some scheme which requires a numeric primary key
2 - my table has a unique key serial: The id is also unique by definition; why not use serial as the primary? This one is automatically indexed because you defined it as unique.
3 - i want to query over the other fields like where param_12 not like '%I.S%' or where param_13 = '1': A like '%something%' query can not really use an index; is there some way you can change param12 to param 12a which is the first %, and param12b which is 'I.S%'? An index can be used on a like statement if the starting string is known.
4 - 200,000 is not big and this is exactly why i am surprised: yep, 200.000 is not that much. But without good indexes, queries and/or cache size MySQL will need to read all data from disk for comparison, which is slow.
5 - i even have problem when adding a simple field: my question
6 - can i create an INDEX for BOOL fields? Yes you can, but an index which matches half of the time is fairly useless, an index is used to limit the amount of records MySQL has to load fully as much as possible; if an index does not dramatically limit that number, as is often the case with boolean (in a 50-50 distribution), using an index only requires more disk IO and can slow searching down. So unless you expect something like an 80-20 distribution or better creating an index will cost time, and not win time.

Index on param_13 might be used, but not the one on param_12 in this example, since the use of LIKE '% negate the index use.

If you're querying data with LIKE '%asdasdasd%' then no index can help you. It will have to do a full scan every time. The problem here is the leading % because that means that the substring you are looking for can be anywhere in the field - so it has to check it all.
Possibly you might look into full-text indexing, but depending on your needs that might not be appropriate.

Firstly, ensure your table have a primary key.
To answer in any more detail than that you'll need to provide more information about the structure of the table and the types of queries you are running.

I don't believe that the keys you have will help. You have to index on the columns used in WHERE clauses.
I'd also wonder if the LIKE requires table scans regardless of indexes. The minute you use a function like that you lose the value of the index, because you have to check each and every row.
You're right: 200K isn't a huge table. EXPLAIN PLAN will help here. If you see TABLE SCAN, redesign.

Related

MySQL - How to optimize table for speed

First of all, I am still very new to PHP / mySQL so excuse my question if it is too simple for you :)
Here is my issue: I am currently working on storing a lot of data into a mysql database. It's basically a directory like archive for my son's school.
The structure is basically:
id, keyword, title, description, url, rank, hash
id is int 11
keyword is varchar 255
title is varchar 255
description is text
url is varchar 255
rank is int 2
hash is varchar 50
We plan to insert about 10 million rows containing the fields above and my mission is being able to query the database as fast as possible.
My query is always for an exact keyword.
For example:
select * from table where keyword = "keyword" limit 10
I really just need to query the keyword and not the title or description or anything else. There are a maximum of 10 results for each keyword - never more.
I have no real clue about mysql indexes and stuff but I read that it can improve speed if you have some indexes.
Now here is where I need help from a Pro. My mission is being able to run the fastest possible query, so it doesn't take too long to query the database. Since I am only looking up the keyword field, I am sure there is a way to make sure that even if you have millions of rows, that the results can be returned quickly.
What would you suggest that I should do. Should I set the keyword field to INDEX or do I have to watch anything else? Since I have no real clue about INDEXES, your help is appreciated, meaning I don't know if I should use indexes at all, or if I have to use them for everything like keyword, title, description and so on...
The database is updated frequently - in case it matters.
Do you think it's even possible to store millions of rows and doing a query in less than a second?
Any other suggestions such as custom my.cnf settings etc would be also helpful.
Your help is greatly appreciated.
Your intuition is correct - if you are only filtering on keyword in your WHERE clause, it should be indexed and you likely will see some execution speed improvement if you do.
CREATE INDEX `idx_keyword` ON `yourtable` (`keyword`)
You may be using a client like PHPMyAdmin which makes index creation easier than execution commands, but review the MySQL documentation on CREATE INDEX. Yours is a very run-of-the-mill case, so you won't need any special options.
Although this isn't the case for you (as you said there would be up to 10 rows per keyword), if you already had a unique constraint or PRIMARY KEY or FOREIGN KEY defined on keyword, it would function as an index as well.
Add an index on the keyword column. It will increase the speed significantly. Then it should be no problem to query the data in milliseconds.
In generel you should put an index on fields you are using in your where clause. That way the DB can limit the data really fast and return the results.

How do Mysql index process when a table have more index than one?

Sample table
field 0 : no(PK)
field 1 : title
field 2 : description
field 3 : category1(INDEX)
field 4 : category2(INDEX)
field 5 : category3(INDEX)
field 6 : category4(INDEX)
field 7 : category5(INDEX)
Above is a sample that i will use on my website and category fields have an index each.
If i execute like this command below
select * from table where category1=1 and category2=2 and category3=3 and category4=4 and category5=5
To compare that a table have only one category field to that the table have a lot of category like above table. Which one is better?
I figured out that of course, a table which have only one category field is good choice.
But i really don't know deep information about a calculation process of index.
I have to explain something different between them to my boss!!!!
So i want to get some information with a "sample" with index cost, sample data, calculation process or other will be useful to understand about index calculation process
In general, if you have query with more than one WHERE constraint, the best index to have is compound index which contains all fields that were constrained - in your case it will be index on (category1, category2, category3, category4, category5)
However, in practice it is really wasteful to have so many compound indexes. Also, index is only useful if it has high selectivity. For example, if you have field which may have values 0 or 1 with equal probability (selectivity 1/2), it is almost always NOT worth creating index on such a field or even including this field in compound index.
At any rate, always try running EXPLAIN ANALYZE to get an idea what query planner is thinking and which index it will choose. If you have sequential scan, it may be reason to worry, but not always (for example, using low selectivity index may not be worth it for a planner)
You can analyze what the execution engine will do using EXPLAIN EXTENDED query-phrase. Best case scenario is that MySQL will use an index merge. This means that it will select every option via it's own index, then merge the result sets without any index help. Usually, a composite index is much faster, but that might depend on the number of records and the usage scenario (high or low turnover of records).
As already written by mvp before, use EXPLAIN syntax to see how the query optimizer would handle your query. In general mysql uses one index per table you access to fetch the data you are looking for. The optimizer also tries to find the one with the highest selectivity in case there are several indexes possible.
E.g. you might have a query like yours:
SELECT * FROM table WHERE category1=1 AND category2=2 AND category3=3 AND category4=4 AND category5=5
It would be possible to use a combined index that contains category1, category2, category3, category4 and category5 or also a combined index that contains only category1 and category2. The optimizer would decide at runtime which one it would take.
Another common example would be:
SELECT * FROM table WHERE category1=1 OR category2=2
The query optimizer can only use an index for category1 OR category2 but not both! At least this was what mysql EXPLAIN returned. It might be possible for other databases to run both selections in parallel and simply join the two results and remove duplicates.
Before you start adding lots of indexes remember the overhead they produce. If you have much more read accesses than write accesses it might work out. But if you have also many insert or update operations, the indexes need to be adjusted every time which causes an additional load and increases the query execution time.
For your follow up I recommend this Mysql chapter How MySQL uses indexes

What are the biggest benefits of using INDEXES in mysql?

I know I need to have a primary key set, and to set anything that should be unique as a unique key, but what is an INDEX and how do I use them?
What are the benefits? Pros & Cons? I notice I can either use them or not, when should I?
Short answer:
Indexes speed up SELECT's and slow down INSERT's.
Usually it's better to have indexes, because they speed up select more than they slow down insert.
On an UPDATE the index can speed things way up if an indexed field is used in the WHERE clause and slow things down if you update one of the indexed fields.
How do you know when to use an index
Add EXPLAIN in front of your SELECT statement.
Like so:
EXPLAIN SELECT * FROM table1
WHERE unindexfield1 > unindexedfield2
ORDER BY unindexedfield3
Will show you how much work MySQL will have to do on each of the unindexed fields.
Using that info you can decide if it is worthwhile to add indexes or not.
Explain can also tell you if it is better to drop and index
EXPLAIN SELECT * FROM table1
WHERE indexedfield1 > indexedfield2
ORDER BY indexedfield3
If very little rows are selected, or MySQL decided to ignore the index (it does that from time to time) then you might as well drop the index, because it is slowing down your inserts but not speeding up your select's.
Then again it might also be that your select statement is not clever enough.
(Sorry for the complexity in the answer, I was trying to keep it simple, but failed).
Link:
MySQL indexes - what are the best practices?
Pros:
Faster lookup for results. This is all about reducing the # of Disk IO's. Instead of scanning the entire table for the results, you can reduce the number of disk IO's(page fetches) by using index structures such as B-Trees or Hash Indexes to get to your data faster.
Cons:
Slower writes(potentially). Not only do you have to write your data to your tables, but you also have to write to your indexes. This may cause the system to restructure the index structure(Hash Index, B-Tree etc), which can be very computationally expensive.
Takes up more disk space, naturally. You are storing more data.
The easiest way to think about an index is to think about a dictionary. It has words and it has definitions corresponding to those words. The dictionary has an index on "word" because when you go to a dictionary you want to look up a word quickly, then get its definition. A dictionary usually contains just one index - an index by word.
A database is analogous. When you have a bunch of data in the database, you will have certain ways that you want to get it out. Let's say you have a User table and you often look up a user by the FirstName column. Since this is an operation that you are doing often in your application, you should consider using an index on this column. That will create a structure in the database that is sorted, if you will, by that column, so that looking up something by first name is like looking up a word in a dictionary. If you didn't have this index you might need to look at ALL rows before you determine which ones have a specific FirstName. By adding an index, you have made this fast.
So why not put an index on all columns and make them all fast? Like everything, there is a trade off. Every time you insert a row into the table User, the database will need to perform its magic and sort everything on your indexed column. This can be expensive.
You don't have to have a primary key. Indexes (of any type) are used to speed up queries and, at least with the InnoDB engine, enforce foreign key constraints. Whether you use a unique or plain (non-unique) index depends on whether you want to allow duplicate values in the key.
This is a general database concept, you might use external resources to read about it, like http://beginner-sql-tutorial.com/sql-index.htm or http://en.wikipedia.org/wiki/Index_(database)
An index allows MySQL to find data quicker. You use them on columns that you'll be using in WHERE clauses. For example, if you have a column named score, and want to find everything with where score > 5, by default this means MySQL will need to scan through the WHOLE table to find those scores. However if you use a BTREE index, finding those that meet that condition will happen a LOT faster.
Indices have a price: disk and memory space. If it's a very big table, your index will grow rather large.
Think of it this way: what are the biggest benefits of having an index in a book? It's much the same thing. You have a slightly larger book, yet you're able to quickly look things up. When you create an index on a column, you're saying you want to be able to reference it in a where clause to look it up quickly.

MySQL index question

I've been reading about indexes in MySQL recently, and some of the principles are quite straightforward but one concept is still bugging me: basically, if in a hypothetical table with, let's say, 10 columns, we have two single-column indexes (for column01 and column02 respectively), plus a primary key column (some other column), then are they going to be used in a simple SELECT query like this one or not:
SELECT * FROM table WHERE column01 = 'aaa' AND column02 = 'bbb'
Looking at it, my first instinct is telling me that the first index is going to retrieve a set of rows (or primary keys in InnoDB, if I got the idea right) that satisfy the first condition, and the second index will get another set. And the final result set will be just the intersection of these two. In the books that I've been going through I cannot find anything about this particular scenario. Of course, for this particular query one index on both columns seems like the best option, but I am struggling with understanding the real process behind this whole thing if I try to use two indexes that I described above.
Its only going to use a single index. You need to create a composite index of multiple columns if you want it to be able to index off of each column you are testing. You may want to read the manual to find out how MySQL uses each type of index, and how to order your composite indexes correctly to get the best utilization of it.
It's actually the most common question
about indexing at all: is it better to
have one index with all columns or one
individual index for every column?
http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/index-combine-performance

MySQL: low cardinality/selectivity columns = how to index?

I need to add indexes to my table (columns) and stumbled across this post:
How many database indexes is too many?
Quote:
“Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index.”
Is an Index really pointless if there are only two distinct values? Given a table as follows (MySQL Database, InnoDB)
Id (BIGINT)
fullname (VARCHAR)
address (VARCHAR)
status (VARCHAR)
Further conditions:
The Database contains 300 Million records
Status can only be “enabled” and “disabled”
150 Million records have status= enabled and 150 Million records have
stauts= disabled
My understanding is, without having an index on status, a select with where status=’enabled’ would result in a full tablescan with 300 Million Records to process?
How efficient is the lookup when I use a BTREE index on status?
Should I index this column or not?
What alternatives (maybe any other indexes) does MySQL InnoDB provide to efficiently look records up by the "where status="enabled" clause in the given example with a very low cardinality/selectivity of the values?
The index that you describe is pretty much pointless. An index is best used when you need to select a small number of rows in comparison to the total rows.
The reason for this is related to how a database accesses a table. Tables can be assessed either by a full table scan, where each block is read and processed in turn. Or by a rowid or key lookup, where the database has a key/rowid and reads the exact row it requires.
In the case where you use a where clause based on the primary key or another unique index, eg. where id = 1, the database can use the index to get an exact reference to where the row's data is stored. This is clearly more efficient than doing a full table scan and processing every block.
Now back to your example, you have a where clause of where status = 'enabled', the index will return 150m rows and the database will have to read each row in turn using separate small reads. Whereas accessing the table with a full table scan allows the database to make use of more efficient larger reads.
There is a point at which it is better to just do a full table scan rather than use the index. With mysql you can use FORCE INDEX (idx_name) as part of your query to allow comparisons between each table access method.
Reference:
http://dev.mysql.com/doc/refman/5.5/en/how-to-avoid-table-scan.html
I'm sorry to say that I do not agree with Mike. Adding an index is meant to limit the amount of full records searches for MySQL, thereby limiting IO which usually is the bottleneck.
This indexing is not free; you pay for it on inserts/updates when the index has to be updated and in the search itself, as it now needs to load the index file (full text index for 300M records is probably not in memory). So it might well be that you get extra IO in stead of limitting it.
I do agree with the statement that a binary variable is best stored as one, a bool or tinyint, as that decreases the length of a row and can thereby limit disk IO, also comparisons on numbers are faster.
If you need speed and you seldom use the disabled records, you may wish to have 2 tables, one for enabled and one for disabled records and move the records when the status changes. As it increases complexity and risk this would be my very last choice of course. Definitely do the move in 1 transaction if you happen to go for it.
It just popped into my head that you can check wether an index is actually used by using the explain statement. That should show you how MySQL is optimizing the query. I don't really know hoe MySQL optimizes queries, but from postgresql I do know that you should explain a query on a database approximately the same (in size and data) as the real database. So if you have a copy on the database, create an index on the table and see wether it's actually used. As I said, I doubt it, but I most definitely don't know everything:)
If the data is distributed like 50:50 then query like where status="enabled" will avoid half scanning of the table.
Having index on such tables is completely depends on distribution of data, i,e : if entries having status enabled is 90% and other is 10%. and for query where status="disabled" it scans only 10% of the table.
so having index on such columns depends on distribution of data.
#a'r answer is correct, however it needs to be pointed out that the usefulness of an index is given not only by its cardinality but also by the distribution of data and the queries run on the database.
In OP's case, with 150M records having status='enabled' and 150M having status='disabled', the index is unnecessary and a waste of resource.
In case of 299M records having status='enabled' and 1M having status='disabled', the index is useful (and will be used) in queries of type SELECT ... where status='disabled'.
Queries of type SELECT ... where status='enabled' will still run with a full table scan.
You will hardly need all 150 mln records at once, so I guess "status" will always be used in conjunction with other columns. Perhaps it'd make more sense to use a compound index like (status, fullname)
Jan, you should definitely index that column. I'm not sure of the context of the quote, but everything you said above is correct. Without an index on that column, you are most certainly doing a table scan on 300M rows, which is about the worst you can do for that data.
Jan, as asked, where your query involves simply "where status=enabled" without some other limiting factor, an index on that column apparently won't help (glad to SO community showed me what's up). If however, there is a limiting factor, such as "limit 10" an index may help. Also, remember that indexes are also used in group by and order by optimizations. If you are doing "select count(*),status from table group by status", an index would be helpful.
You should also consider converting status to a tinyint where 0 would represent disabled and 1 would be enabled. You're wasting tons of space storing that string vs. a tinyint which only requires 1 byte per row!
I have a similar column in my MySQL database. Approximately 4 million rows, with the distribution of 90% 1 and 10% 0.
I've just discovered today that my queries (where column = 1) actually run significantly faster WITHOUT the index.
Foolishly I deleted the index. I say foolishly, because I now suspect the queries (where column = 0) may have still benefited from it. So, instead I should explicitly tell MySQL to ignore the index when I'm searching for 1, and to use it when I'm searching for 0. Maybe.