My question is a little simillar to Extract specific words from text field in mysql, but now the same.
I have a text field with words inside. In my language word can have many different endings. I need to find this endings.
I use fulltext search of mysql, but I would need to have access to the index database where all the field is "cut" to words and words are counted. I could then search for "test*" and I could quickly find "test", "tested", "testing". I need the list of all endigns that exist in my database, that is my primary goal.
As it is I can get the records with specific "test*" words in it, but I need not only to locate the occurence in the field, but to group somehow so I get the list of all the words that for example start with "test". I don't need location in which record they are, just a list, grouped so that "testing" is not written 10 times but only once (maybe a counter of how many times it is found but not necessary).
Is there a way to extract this info from fulltextsearch field or should I explode all this fields to words and make a index table full of words and just do a "like "word%" and group by the different results? I am not sure how to do that either in practice, but just to point me to the right direction please.
So to summarize: I have a text fied and I need to find out which words are inside that start with "test", like "tested", "test", "testing" etc... It doesn't make sense in English but in my language it does as we have same word on different endigns and there are so many of them, somethimes 20, I need to find out which ones are there so I can make a synonims table ;-)
UPDATE:
Database has columns ID (int), ingredients (text) and recipe (text).
Data in ingredients are cooking ingredients with different endings like:
1 egg
2 eggs
etc.
You can dump all words that are present in an index. And that would also show frequency of each word. E.g. test is used 200 times and testing is used 300 times.
Manual for that: http://dev.mysql.com/doc/refman/5.0/en/myisam-ftdump.html
Related
I need to parse a table of names, create a mysql table for them and insert them. Some people have 2 or maybe even more first names/last names. The file only has full names as long strings and no information where the first name stops and last name begins. Initially I wanted to create only 2 VARCHAR fields but now I am not sure anymore. Should I go with 1 field and FULLTEXT index on it instead? I want to be able to search fast by individual first or last names. I use 10.1.38 MariaDB.
We have a large table with product information. Almost all the time we need to find product names that contain specific words, but unfortunately these queries take forever to run.
Example: Find all the products where the name contains the words "steel" and "102" (not necessarily next to each other, so a product like "Ninja steel iron 102 x" is a match, just like "Dragon steel 102 b" is it).
Currently we are doing it like this:
SELECT columns FROM products WHERE name LIKE '%WORD1%' AND name LIKE '%WORD2%' (the number of like words are normally 2-4, but it can in theory be 7-8 or more).
Is there a faster way of doing this?
We are only matching words, so I wonder if that can help somehow (i.e. the products in the example above are matches, but "Samurai swordsteel 102 v" is not a match since "steel" doesn't stand alone).
My own thought is to make a helper table with the words from productnames in and then use that table to get the ids of the matching products.
i.e. a table like: [id, word, productid] so we get for example:
1, samurai, 3
2, swordsteel, 3
3, 102, 3
4, v, 3
Just wonder if there is a built in way to do this in MySQL, so I don't have to implement my own stuff + maintain two tables.
Thanks!
Unfortunately, you have wild cards at the beginning of the pattern name. Hence, MySQL cannot use a standard index for this.
You have two options. First, if the words are really keywords/attributes, then you should have another table, with one row per word.
If that is not the case, you can try a full text index. Note that MySQL has attributes for the minimum words length and uses a stop words list. You should take these into account before building the index.
I am building a database to store answers of some questions, with Answer, Tag, Tagmap, 3 tables. An answer record can have multiple tags which is used for searching. Tagmap is linking Answer and Tag.
The application lets user to input a string to search, e.g. "2014 math part a". I used explode in php to split string into array, then make my sql statement, with keyword LIKE. Doing Like an a , probably all records will be returned. What will be the proper way to search corresponding answer records. Sorry for my english!
You should ignore inputs which are too short, eg less than 3 chars. So awould be ignored but aaa you could search for. But you should also exclude some common words "with no meaning" like the in english or der, die, das in german.
So if a user enters 2014 math part a I would only search for 2014, math and part.
Also you should think about giving the user the possible to select some tags to reduce the amount of answers in which you search for your keywords before you do the "expansive" like %keyword%search.
I have a table for some companies that may have many branches in different countries. These countries are inserted in countries field.
Now, I have to make a searching system that allows users to find companies that have any branch in a specific country.
My question is: Which one do I have to use ? MATCH AGAINST or LIKE ? The query must search all records to find complete matched items.
attention: Records may have different country name separated with a comma.
MATCH AGAINST clause is used in Full Text Search.
for this you need to create a full text index on search column countries.
full text index search is much faster than LIKE '%country%' serach.
I would change the implementation: having a field that contains multiple values is a bad idea, for example, it's difficult to maintain - how will you implement remove a country from a company ?.
I believe that a better approach would be to have a separate table companies_countries which will have two columns: company_id and country_id, and could have multiple lines per company.
You should use LIKE . Because as #Omesh mentioned MATCH AGAINST clause is used for Full Text Search.. And Full Text Search need entire column for search.
Is there a way to select entries from fulltext index in MySQL?
No, not that I know of. It would be a great feature though.
I built a search interface with autocomplete on top of MySQL. I run a daily job that scans all columns in all tables that I want to search in, extract words with regular expressions, then store the words in a separate table. I also have a many-to-many table with one column to hold the id of the object, and one column to hold the id of the word so as to record the fact that "word is part of text belonging to object".
The autocomplete works by taking the words typed into the box, and then generating a query that goes like:
SELECT obj.title
FROM obj_word
INNER JOIN obj
ON obj_word.obj_id = obj.id
INNER JOIN word
ON obj_word.word_id = word.id
WHERE word.word IN ('word1', 'word2', 'word3') -- generated dynamically, word1 etc are typed by the user
GROUP BY obj.id
HAVING COUNT(DISTINCT word.id) = 3 -- the 3 is generated, because user typed 3 words.
This works fairly well for me, but I don't have huge amounts of data to work with.
(the actual implementation is slightly fancier, beause the last word is matched with LIKE to allow partial matches)
EDIT:
I just learned that the myisam_ft_dump utility may be used to extract a list of words from the index file. The command line goes something like this:
myisam_ftdump -d film_text 1 > D:\tmp\out.txt
Here, -d means dump (get a list of all entries), film_text is the name of a MyISAM table with a full text index, 1 is one, and ordinal identifying which index you want to dump.
I must say, the utility works, but I am not surer it is fast enough to use this for pulling a live list for autocompletion. You could of course have a periodical job that runs the command and dumps it to file. Unfortunately this dumps index entries not individual, unique words.
My hunch is you could use this utility as a means to extract the words, but it will need processing to turn it into a proper autocomplete list.