I am using InnoDB engine in mysql version:5.1.61-community-log. The length of text to be searched is less than 100 characters. Does the performance of queries with LIKE '%searchstring%' improve with indexing the text ?
EDIT:
I am using this query for jquery auto-suggest component.
I am using 3rd party webhosting service . So upgrade is not an option for me
No, adding an index won't help your LIKE '%foo%' queries (much*). A LIKE condition can only use the index efficiently if you have a constant prefix, such as LIKE 'foo%'
You should consider a full text search instead. If you are on MySQL 5.5 or older only the MyISAM engine was supported. In MySQL 5.6 or newer InnoDB is also supported for full text searches.
If you are using an older version of MySQL, are unable to upgrade, and are unable to change the storage engine (because you need other features of InnoDB such as foreign key constraints), then you could consider creating a new MyISAM table which stores only the primary key and the text column. You can use this table to perform fast full text searches and join to the original table if you also need access to the other columns.
You could also consider using an external text search engine such as:
Sphinx
Lucene
(*) If the index you add is a covering index for your query, you will get a small improvement by adding an index. Due to the smaller width of the index, a full scan of the index will be faster than a full scan of the entire table.
Related
I have a table in MYSQL database that is about 45GB and my disk available space is 30GB. The table type is MyISAM and MySQL version is 5.6. I want to create FULLTEXT on two columns of the table called name and text where name is varchar(255) and text is longtext.
Will MYSQL create temporary table of about the same size as the size of the table ( 45GB) when creating index?
Which is better to use:
ALTER TABLE ADD FULLTEXT or CREATE FULLTEXT INDEX in above context?
The two commands are the same (one maps to the other). It will make a full copy of the table with all its new indexes. So, it is impossible since you have less free disk space than even the table size.
Get bigger machine? Remove other stuff? Do other cleanup?
Note further, the FT index may be close to 45GB, itself. So, even if you could create the index, it might overflow your 30GB.
InnoDB's FULLTEXT seems to create a much smaller index. But still, converting to InnoDB cannot be done without more than 45GB free space.
I'm currently using a utf8 mysql database. It checks if a translation is already in the database and if not, it does a translation and stores it in the database.
SELECT * FROM `translations` WHERE `input_text`=? AND `input_lang`=? AND `output_lang`=?;
(The other field is "output_text".) For a basic database, it would first compare, letter by letter, the input text with the "input_text" "TEXT" field. As long as the characters are matching it would keep comparing them. If they stop matching, it would go onto the next row.
I don't know how databases work at a low level but I would assume that for a basic database, it would search at least one character from every row in the database before it decides that the input text isn't in the database.
Ideally the input text would be converted to a hash code (e.g. using sha1) and each "input_text" would also be a hash. Then if the database is sorted properly it could rapidly find all of the rows that match the hash and then check the actual text. If there are no matching hashes then it would return no results even though each row wasn't manually checked.
Is there a type of mysql storage engine that can do something like this or is there some additional php that can optimize things? Should "input_text" be set to some kind of "index"? (PRIMARY/UNIQUE/INDEX/FULLTEXT)
Is there an alternative type of database that is compatible with php that is far superior than mysql?
edit:
This talks about B-Tree vs Hash indexes for MySQL:
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
None of the limitations for hash indexes are a problem for me. It also says
They are used only for equality comparisons that use the = or <=> operators (but are very fast)
["very" was italicized by them]
NEW QUESTION:
How do I set up "input_text" TEXT to be a hash index? BTW multiple rows contain the same "input_text"... is that alright for a hash index?
http://dev.mysql.com/doc/refman/5.5/en/column-indexes.html
Says "The MEMORY storage engine uses HASH indexes by default" - does that mean I've just got to change the storage engine and set the column index to INDEX?
A normal INDEX clause should be enough (be sure to index all your fields, it'll be big on disk, but faster). FULLTEXT indexes are good when you're using LIKE clauses ;-)
Anyway, for that kind of lookups, you should use a NoSQL store like Redis, it's blazingly fast and has an in-memory store and also does data persistence through snapshots.
There is an extension for php here : https://github.com/nicolasff/phpredis
And you'll have redis keys in the following form: YOUR_PROJECT:INPUT_LANG:WORD:OUTPUT_LANG for better data management, just replace each value with your values and you're good to go ;)
An index will speed up the lookups a lot.
By default indexes in InnoDB and MyISAM use search trees (B-trees). There is a limitation on the length of the row the index so you will have to index only the 1-st ~700 bytes of text.
CREATE INDEX txt_lookup ON translations (input_lang, output_lang, input_text(255));
This will create an index on input_lang, output_lang and the 1-st 255 characters of input_text.
When you select with your example query MySQL will use the index to find the rows with the appropriate languages and the same starting 255 characters quickly and then it will do the slow string compare with the full length of the column on the small set of rows which it got from the index.
In PostgreSQL, we can search table based on full text search like this -
SELECT title
FROM pgweb
WHERE to_tsvector('english', body) ## to_tsquery('english', 'friend');
Source - http://www.postgresql.org/docs/current/static/textsearch-tables.html
How can we do similar search in MySQL 5.5 which is quite easily done in PostgreSQL?
You probably want MySQL's full text search functionality. Essentially you create a FULLTEXT index then search against it using MATCH() ... AGAINST.
I'm not aware of a facility to set the search language per-query in MySQL, but that doesn't mean no such support exists. It wasn't clear if per-query language settings were a requirement for you.
The latest stable release of MySQL supports full text search on the modern transactional and crash-safe InnoDB table type as well as the unsafe MyISAM table type. If your MySQL only does FTS on MyISAM it's time to upgrade. 5.6 supports full text search on InnoDB.
Alternately, if you really can't upgrade, you can store your important data in InnoDB tables and run a periodic query to update a MyISAM table you use as a materialized view for fulltext search only:
Create a new MyISAM table
INSERT INTO ... SELECT the data from the InnoDB table into the new MyISAM table
CREATE the fulltext index on the new MyISAM table
DROP the old MyISAM table you were using for fulltext indexing; and
finally ALTER TABLE ... RENAME the new MyISAM table to have the name of the old one.
You'll have a very short window during which the fulltext index is unavailable between when you drop the old table and re-create the new one. Your data also gets out of date and stale between view refreshes, though it's possible you can work around that with triggers (I don't use MySQL enough to know). If you can't live with these limitations, upgrade to 5.6.
MySQL's full text search offers control of stopwords and other tuning. It's a solid offering that should do the job nicely.
I realize that MySQL 5.6 is still in beta, but does anyone have experience using the new InnoDB FTS engine? How does it compare to something like Sphinx?
Thanks
Jason
Never used Sphinx, but tried MySQL 5.6 FTS on an Innodb table with about 170k rows. Made an FTS index on the name column (contains all names of a person). To find a word in any position of the string MATCH(name) AGAINST("+word*") IN BOOLEAN MODE does work a lot faster (2-3 times in my case) than using name LIKE "word%" OR name LIKE "% word". However when making joins do check EXPLAIN to see if the FTS index is actually used. It seems MySQL optimizer is not that good at guessing when the FTS index should be used.
The FULLTEXT feature that formerly required downloading a special build from labs.mysql.com is now part of the mainline MySQL build in 5.6.5 and up (still in beta). The documentation for the FULLTEXT functions now includes the InnoDB-specific details: MySQL Full-Text Search Functions
Remember, that Sphinx search is developed for full text searching in mysql it's just a feature...
Here you have compare of sphinx and mysql FTS:
http://www.percona.com/files//presentations/opensql2008_sphinx.pdf
Here is performance test of InnoDB FTS compared to MyISAM:
http://blogs.innodb.com/wp/2011/07/innodb-fts-performance/
InnoDB its bit faster especially in indexing, but it's still far away from sphinx performance...
Is it bad practice to create a mirrored table (MyISAM) of the records in an InnoDB table for the purposes of doing full-text searches? I figure this way I'm just searching a copy of the data and if anything happens to that data it's not as big of a deal because it can always be re-created. But, it just feels awkward.
(MyISAM is the only engine that supports full-text searching, but I need to use the foreign key constraints offered by InnoDB)
Should I avoid this?
first of all, have you considered using a good search indexer? for example lucene : http://lucene.apache.org/java/docs/ will speed up searches a lot as it builds its own index tables.
if you definitely want to use the inbuilt mysql full-text search, you could cut down the myisam table so that it just contains the text data you want to search and the primary key - and then retrieve the proper data from the normal innodb tables once you know the pkey. that would avoid duplication of the other data in the table.