Currently I have the following query...
SELECT id,
LOWER(title) as title,
LOWER(sub_title) as sub_title
FROM ebay_archive_listing
WHERE MATCH(title, sub_title) AGAINST ("key" IN BOOLEAN MODE)
However it is not finding rows where the title contains the word "key". "key" is generated dynamically based on a set of keywords, so sometimes it contains + and - symbols.
MySQL's default fulltext indexing settings won't index or match any word of fewer than four letters.
If you have a phrase made entirely of words of fewer than four letters like key you have to manually fall back to an unindexed LIKE search instead.
You can change this behaviour by lowering the ft_min_word_len setting. You might also want to change or disable the list of ‘stopwords’ (which are also not indexed) whilst you're at it, as the default list is brutal and bizarre.
Related
I'm building a query that is used by an autocomplete function on a website. The field "term" is indexed with the Full Index type. My query should be floating the most relevant results to the top of the list. But there are some examples where the most obvious match is not given enough relevancy.
Here's one example. I have a product term "Just Believe Bird Feeder". It does show up in a search for that exact phrase. But with a lower relevancy than terms that contain one of the search words more than once (i.e. "bird tube bird feeder")
Further, searching on "believe" or "just believe" yields zero results.
What would my best solution to overcome this?
SELECT
term,
MATCH (term) AGAINST (
'Just Believe Bird Feeder' IN NATURAL LANGUAGE MODE
) AS relevancy
FROM
autocomplete
WHERE
MATCH (term) AGAINST (
'Just Believe Bird Feeder' IN NATURAL LANGUAGE MODE
)
ORDER BY
relevancy DESC
LIMIT 15
Your words believe and just are on the MyISAM stopword list. Words on that list are ignored when indexing (or searching) with the fulltext index, so you can neither find them, nor will they influence the relevance score.
The idea of a stopword list is to exclude words that are so common in english texts that their occurance bares no relevance. This feature is less useful for searching in short titles or product codes or artificial term lists though.
You can adjust the ft_stopword_file configuration setting to specify your own stopword list, e.g. set it to an empty string to disable it completely, otherwise specify the filename for your own stopword list. You need to rebuild the indexes after adjusting the setting and a server restart, e.g. by using REPAIR TABLE tbl_name QUICK.
If you cannot control the server configuration, you could switch your table to InnoDB, which uses a significantly smaller stopword list.
Some additional notes:
the fulltext index uses a minimum word length, by default 4 for MyISAM and 3 for InnoDB. You may need to adjust those settings too if you want terms like "8 oz" to have an effect.
the order of terms has no effect on the relevance in a fulltext search
After many tries and many searches i came to the following query:
SELECT id,
title,
description,
MATCH(title,description,tags) AGAINST ('$search' IN NATURAL LANGUAGE MODE) AS score
FROM pages
I use this query to search inside a large amounts of pages which contain music lessons. It works quite wel. Pages get a score based on how good they match to the query.
Except when the users search for something like "C Chord" or "Am Chord" the first part is ignored. If i search for "A chord" or "E chord" the page called "C chord" is always on top.
So.. Part 1 of my question is: how can i fix that?
Then part 2. Is it possible to give the column "Title" a bigger importance for the score then "Description"?
MySQL has two important parameters for full text search, stop words and minimum word length. The first is the minimum word size (documented here):
Innodb: innodb_ft_min_token_size (default 3)
MyIsam: ft_min_word_len (default 4)
Words shorter than the minimum are not indexed, so you cannot search on them. Remember to rebuild the index after changing the parameter. Conveniently (hah!) they have different default values.
In addition, there are stop word lists to remove common stop words. Whether or not this is an issue depends on what words you are searching for. You can customize the stop words.
Question
This has been discussed on SO quite a few times: MySQL's built-in fulltext parser is designed for searching for words, not for single characters and comes with default minimum word length setting of 3 (innodb) or 4 (myisam) These settings mean that no words shorter than 3 or 4 characters get indexed and therefore will not be found by a fulltext search. You may lower the minimum character length limit to 1 and rebuild the index, but it will slow the searching down, since the indexes will be bigger.
Question
It is possible, but you need to search on the title field separately and bump up the relevancy score results from the title field.
You can use union to get a combined list with sum() to sum the score up for any record:
SELECT p.id, any_value(title), any_value(description), any_value(tags), sum(t.score) as sum_score
FROM
(SELECT id, (MATCH(title) AGAINST ('$search' IN NATURAL LANGUAGE MODE)) *2 AS score
FROM pages
UNION ALL
SELECT id, MATCH(description,tags) AGAINST ('$search' IN NATURAL LANGUAGE MODE) AS score
FROM pages) t
INNER JOIN pages p on t.id=p.id
GROUP BY p.id
ORDER BY sum(t.score) DESC
You need to adjust the fulltext indexes to be able to do the separate searches.
I have strings like the following in my VARCHAR InnoDB table column:
"This is a {{aaaa->bbb->cccc}} and that is a {{dddd}}!"
Now, I'd like to search for e.g. {{xxx->yyy->zzz}}. Brackets are part of the string. Sometimes searched together with another colum, but which only contains an ordinary id and hence don't need to be considered (I guess).
I know I can use LIKE or REGEXP. But these (already tried) ways are too slow. Can I introduce a fulltext index? Or should I add another helping table? Should I replace the special characters {, }, -, > to get words for the fulltext search? Or what else could I do?
The search works with some ten-thousand rows and I assume that I often get about one hundred hits.
This link should give you all the info you need regarding FULLTEXT indexes in MySQL.
MySQL dev site
The section that you will want to pay particular attention to is:
"Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row."
So in short, to answer your question you should see an improvement in query execution times by implementing a full text index on wide VARCHAR columns. Providing you are using a compatible storage engine ( InnoDB or MyISAM)
Also here is an example of how you can query the full text index and also an additional ID field as hinted in your question:
SELECT *
FROM table
WHERE MATCH (fieldlist) AGAINST ('search text here')
AND ( field2= '1234');
Let's say we have the following query:
SELECT *
FROM companies
WHERE name LIKE '%nited'
It returns
name
united
How do I write a query using MySQL's full-text search that will provide similar results?
Unfortunately you cannot do this using a MySQL full-text index. You cannot retrieve '*nited states' instantly from index because left characters are the most important part of the index. However, you can search 'United Sta*'.
// the only possible wildcard full-text search in MySQL
WHERE MATCH(column) AGAINST ('United Sta*' IN BOOLEAN MODE)
MySQL's full-text performs best when searching whole words in sentences - even that can suck at times. Otherwise, I'd suggest using an external full-text engine like Solr or Sphinx. I think Sphinx allows prefix and suffix wildcards, not sure about the others.
You could go back to MySQL's LIKE clause, but again, running queries like LIKE '%nited states' or LIKE '%nited Stat%', will also suffer on performance, as it can't use the index on the first few characters. 'United Sta%' and 'Unit%States' are okay as the index can be used against the first bunch of known characters.
Another quite major caveat using MySQL's full-text indexing is the stop-word list and minimum word length settings. For example, on a shared hosting environment, you will be limited to words greater than or equal to 4-characters. So searching 'Goo' to get 'Google' would fail. The stop-word list also disallows common words like 'and', 'maybe' and 'outside' - in-fact, there are 548 stop-words all together! Again, if not using shared hosting, these settings are relatively easily to modify, but if you are, then you will get annoyed with some of the default settings.
You can use MySQL's full-text index, but you must configure the parser to be the n-gram parser.
If your data is English (as opposed to Chinese or similar), you ought to also increase the ngram_token_size parameter to the minimum search term length you are willing to have. Otherwise, the search will be unacceptably slow.
You will also want to set innodb_ft_enable_stopword=0, otherwise an idiosyncrasy of how ngram stopword handling works will mean that many useful queries will return no results.
To explain why you must also increase ngram_token_size, you may think of this index as the following schema. MySQL then does a series of joins to find the results which match the search term:
CREATE TABLE fulltext_index
(
docid int(11) NOT NULL,
term char(2) NOT NULL,
PRIMARY KEY (docid, term),
INDEX term_idx (term)
);
The n-gram (2) parser breaks each word in your query into segments like se, eg, gm, me, en, nt, ts. For each of these n-grams, there are many results in English, so the index doesn't help much since it ends up iterating over everything anyway. Meanwhile, you can see how Chinese 随机的 would split into a much more useful 随机 and 机的. With n-gram size set to 4, the segments are segm, egme, gmen, ment, ents. These larger segments are much more likely to be unique, so each segment narrows down the search space significantly.
Disabling stopwords is also necessary because the ngram parser excludes all n-grams that contain any of the stopwords. For example, with an n-gram (4) parser, stopword will be parsed into stop, topw, opwr, pwor, and word:
stop will be excluded because it contains "to"
topw will be excluded because it contains "to"
opwr will be kept
pwor will be excluded because it contains "or"
word will be excluded because it contains "or"
Because these tokens are excluded from the index, a search for MATCH(name) AGAINST('stop' IN BOOLEAN MODE) would not return anything unless the stopwords mechanism is disabled before creating the index.
To answer your question,
set ngram_token_size to 3, 4, or whatever your minimum search term length is.
set innodb_ft_enable_stopword to 0 or OFF.
create the index with CREATE FULLTEXT INDEX companies_name_idx ON companies (name) WITH PARSER ngram;
SELECT * FROM companies WHERE MATCH(name) AGAINST('nited' IN BOOLEAN MODE);
This will also return results for nitedA, so you might want to further filter the results from there, if that's required for your application.
i use this query for a fulltextsearch in my table:
SELECT Titel FROM cmsa WHERE MATCH(Titel) AGAINST ('+"Ort" +"Berlin"' IN BOOLEAN MODE)
but the result is empty.
If i use
SELECT Titel FROM cmsa WHERE Titel LIKE '%Berlin%'
the result would be (without quotes):
"Ort - Berlin"
Why the fulltextsearch didnt find this result. The word "Ort" and the word "Berlin" are both in the field Titel of the entry.
Other fulltext searches works great.
Any Idea?
I guess it is because MySQL has a server parameter - The minimum length of the word to be included in a FULLTEXT index. Default value for this parameter is 4 so your first word Ort is not included in this index. You should change this system parameter, restart server and then rebuild all FULLTEXT indexes.
REPAIR TABLE cmsa QUICK;
Change the full text index minimum word length with MySQL
Try without double quotes and make sure Mysql engine is MYISAM
SELECT
Titel
FROM
cmsa
WHERE
MATCH(Titel) AGAINST ('+Ort +Berlin' IN BOOLEAN MODE)
Some explanation
Boolean Mode Searches
SELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('+Hurricane -Katrina' IN BOOLEAN MODE);
The above statement would match news stories about hurricanes but not those that mention hurricane katrina.
Query Expansion
The Blind Query Expansion (or automatic relevance feedback) feature can be used to expand the results of the search. This often includes much more noise, and makes for a very fuzzy search.
In most cases you would use this operation if the users query returned just a few results, you try it again WITH QUERY EXPANSION and it will add words that are commonly found with the words in the query.
SELECT headline, story FROM news
WHERE MATCH (headline,story)
AGAINST ('Katrina' WITH QUERY EXPANSION);
The above query might return all news stories about hurricanes, not just ones containing Katrina.
A couple points about Full-Text searching in MySQL:
Searches are not case sensitive
Short words are ignored, the default minimum length is 4 characters. You can change the min and max word length with the variables ft_min_word_len and ft_max_word_len
Words called stopwords are ignored, you can specify your own stopwords, but default words include the, have, some - see default
stopwords list.
You can disable stopwords by setting the variable ft_stopword_file to an empty string.
Full Text searching is only supported by the MyISAM storage engine.
If a word is present in more than 50% of the rows it will have a weight of zero. This has advantages on large datasets, but can make
testing difficult on small ones.
Your query was working perfectly for me. Please check your table is MYISAM or not because full text search working in only myisam engine.