I have this database with three columns id, hash and name. I have to query this table using where clause on the hash value which is a string. I have already loaded 100 million entries in it. I am trying to create an index using the following command
CREATE INDEX i ON table (sequence) using HASH
How much time will it take and how much speed up I will gain after having this index. I am using mysql by the way
Indexing will definitely help.
For better performance on my MYISAM you should take a look at optimizing these variables:
key_buffer_size, myisam_max_sort_file_size and
myisam_sort_buffer_size. key_buffer_size
I would recommend using Innodb though.
Related
I am having a question about "which storage device to choose" for my database tables. I have a table with 28 million records. I will insert data after creating the table, after that, no insert - update -delete operation will take place. Never. Only select operations.
I have a query like below
SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0#v','0%d','13#' ,'148' ,'1e1','1sx','1v$','1#c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~') GROUP BY `indexVal`
This counts how many number of times a particular result appeared in search. In InnoDB, this operation took 5 seconds. This is too much, because my orifginal dataset will be in billions.
To do this kind of work, which MySQL storage you recommend?
More than the storage engine, having the proper index in place seems important.
In your case, CREATE INDEX idx_1 ON key_word (index_val, hashed_word) should help.
And if the data truly never changes, you could even pre-compute and cache some of those results.
For example
CREATE TABLE counts AS SELECT index_val, hashed_word, count(index_val)
FROM key_word
GROUP BY index_val, hashed_word
For SELECT-only queries, ARCHIVE is the fastest storage engine.
As it is MyISAM-based, and the following advice is for MyISAM as well, don't use varchar but fixed-size char columns, and you will get better performance.
Sure, even faster if it's the data is loaded in memory, instead read from disk.
I have a table having 14 million rows and i am trying to perform a full text search on this table. The query for this is performing really slow, it is taking around 9 seconds for a simple binary AND query. The same stuff executes instantly on my private cluster. Size of this table is around 3.1 GB and it contains 14 million rows. Can someone explain this behavior of RDS instance?
SELECT count(*)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)
A high IO rate often indicates insufficient memory, or buffers too small. A 3GB table, including indexes, should fit entirely in memory of a (much-less-than) 500$-per-month dedicated server.
MySQL has many different buffers, and as many parameters to fiddle with. The following buffers are the most important, compare their sizes in the two environments:
If InnoDB: innodb_buffer_pool_size
If MyISAM: key_buffer_size and read_buffer_size
have you added FULLTEXT index on body column if not then try this one surely it will make a big difference
ALTER TABLE `table_name` ADD FULLTEXT INDEX `bodytext` (`body`);
Hope it helps
Try this
SELECT count(1)
FROM table_name WHERE id=97
AND match(body) against ('+data +big' IN BOOLEAN MODE)
This should speed it up a little since you dont have to count all columns just the rows.
Can you post the explain itself?
Since DB version, table, indexes and execution plans are the same, you need to compare machine/cluster configurations. Main points of comparison CPU power available, cores used in single transaction, storage read speed, memory size and read speed/frequency. I can see Amazon provides a variety of configurations, so maybe you private cluster is much more powerful, than Amazon RDS instance config.
To add to above, you can level the load between CPU, IO and Memory to increase throughput.
Using match() against() you perform your research across your entire 3GB fulltext index and there is no way to force another index in this case.
To speed up your query you need to make your fulltext index lighter so you can:
1 - clean all the useless characters and stopwords from your fulltext index
2 - create multiple fulltext indexes and peek the appropriate one
3 - change fulltext searches to LIKE clause and force an other index such as 'id'.
Try placing id in the text index and say:
match(BODY,ID) against (+big +data +97) and id=97
You might also look at sphinx which can be used with MySQL easily.
I have a MySQL table that has 165,716 records (and counting). The table is 233,3 MB large. Now I want to add a FULLTEXT index to a column in that table. Is that possible, or is it going to be a problem?
Kind regards
It's possible if the table is running the MyISAM engine, but it's likely to take a long time to complete the initial indexing.
[edit]I misread the size - I thought it was 2.3GB, not 233MB! If that's the case, the indexing shouldn't take that long.
i was wondering, if i add one index for each field in every table of my DB, will that make my queries run faster?
or do i have to analyze my queries and create indexes only when required?
Adding an index on each column will probably make most of your queries faster, but it's not necessarily the best approach. It is better to tune your indexes to your specific queries, using EXPLAIN and performance measurements to guide you in adding the correct indexes.
In particular you need to understand when you shouldn't index a column, and when you need multi-column indexes.
I would advise reading the MySQL manual for optimization of SELECT statements which explains under what conditions indexes can be used.
The more indexes you have, the heavier inserting/updating gets. So it's a tradeoff. The select queries that cannot use an index now will get quicker ofcourse, but if you check what fields you're joining on (or using in a where) you will not trade off that much
(and, ofcourse, there is the disk-space, but most of the time I don't really care bout that: ) )
Another point is that MySql can only use a single index for a query, so if your query is
SELECT * FROM table WHERE status = 1 AND col1='foob' AND col2 = 'bar'
MySql will use 1 of the indexes, and filter out the rest reading the data from the table.
If you have queries like this, its better to create a composite index on (status, col1, col2)
Adding index on every field in every table is not smart.
You should add indexes ONLY on columns that you use in the WHERE clause in select OR on which you sort.
Often, the best results are achieved by using multi-column indexes that are specific to your SQL selects.
There are also a partial indexes with limit on the length of field which can also be used to optimize performance and reduce the index site.
Every unnecessary index will slow down the database during the insert because on every insert, every index has to be updated.
Also the more indexes you have, the more chances you have of data corruption. And lastly, indexes take extra storage space on disk, sometimes a lot of space.
Also MySQL tries to keep indexes in memory. If you have unnecessary indexes, there is a good change MySQL will end up using up the available memory with unnecessary indexes in which case your performance will degrade considerable.
Creating the right kind of indexes is probably the single most important optimization technique. That's why when someone asks something like this I thought it was a joke.
This question can only be asked by someone who have not read a single book on MySQL. Just get a good book and read it, then you will not have to ask questions like this.
One day I suspect I'll have to learn hadoop and transfer all this data to a non-structured database, but I'm surprised to find the performance degrade so significantly in such a short period of time.
I have a mysql table with just under 6 million rows.
I am doing a very simple query on this table, and believe I have all the correct indexes in place.
the query is
SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date
the explain returns
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE updateshows range date_idx date_idx 7 NULL 648997 Using where
so i am using the correct index as far as I can tell, but this query is taking 11 seconds to run.
The database is MyISAM, and phpMyAdmin says the table is 1.0GiB.
Any ideas here?
Edited:
The date_idx is indexes both the date and venid columns. Should those be two seperate indexes?
What you want to make sure is that the query will use ONLY the index, so make sure that the index covers all the fields you are selecting. Also, since it is a range query involved, You need to have the venid first in the index, since it is queried as a constant. I would therefore create and index like so:
ALTER TABLE events ADD INDEX indexNameHere (venid, date, time);
With this index, all the information that is needed to complete the query is in the index. This means that, hopefully, the storage engine is able to fetch the information without actually seeking inside the table itself. However, MyISAM might not be able to do this, since it doesn't store the data in the leaves of the indexes, so you might not get the speed increase you desire. If that's the case, try to create a copy of the table, and use the InnoDB engine on the copy. Repeat the same steps there and see if you get a significant speed increase. InnoDB does store the field values in the index leaves, and allow covering indexes.
Now, hopefully you'll see the following when you explain the query:
mysql> EXPLAIN SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date;
id select_type table type possible_keys key [..] Extra
1 SIMPLE events range date_idx, indexNameHere indexNameHere Using index, Using where
Try adding a key that spans venid and date (or the other way around, or both...)
I would imagine that a 6M row table should be able to be optimised with quite normal techniques.
I assume that you have a dedicated database server, and it has a sensible amount of ram (say 8G minimum).
You will want to ensure you've tuned mysql to use your ram efficiently. If you're running a 32-bit OS, don't. If you are using MyISAM, tune your key buffer to use a signficiant proportion, but not too much, of your ram.
In any case you want to run repeated performance testing on production-grade hardware.
Try putting an index on the venid column.