mysql fulltext search searching simple words like "part a" - mysql

I am building a database to store answers of some questions, with Answer, Tag, Tagmap, 3 tables. An answer record can have multiple tags which is used for searching. Tagmap is linking Answer and Tag.
The application lets user to input a string to search, e.g. "2014 math part a". I used explode in php to split string into array, then make my sql statement, with keyword LIKE. Doing Like an a , probably all records will be returned. What will be the proper way to search corresponding answer records. Sorry for my english!

You should ignore inputs which are too short, eg less than 3 chars. So awould be ignored but aaa you could search for. But you should also exclude some common words "with no meaning" like the in english or der, die, das in german.
So if a user enters 2014 math part a I would only search for 2014, math and part.
Also you should think about giving the user the possible to select some tags to reduce the amount of answers in which you search for your keywords before you do the "expansive" like %keyword%search.

Related

Searching table efficiently for specific phrase

I want to search an entire column for a specific phrase. I know I can use the SQL statement of SELECT description FROM questions WHERE description LIKE '%what%' AND description LIKE '%if%' to search for the phrase "What if" in the description column. My problem is that if I have a million entries, then searching through the column might take a while.
Is there a way to search through an entire column efficiently to check if a specific phrase exists?
Full text search provides exactly what you want. By creating a text index for your search column, mysql will look for your phrase in that column efficiently, and will return to you a score based match for your phrase, which you can effectively use to get your result.

Select the records containing one or more words fully in UPPERCASE

I have a query in MYSql database. I have a table order_det, the table's column remarks_desc contains the entries as follows:
Table structure:
Table: order_det
Columns: rec_id, remarks_desc
Sample records in order_det table
rec_id remarks_desc
_________________________________________________________
1 a specific PROGRAMMING problem
2 A software Algorithm
3 software tools commonly USED by programmers
4 Practical, answerable problems that are unique to the programming profession
5 then you’re in the right place to ask your question
6 to see if your QUESTION has been asked BEFORE
My requirement I want to select only the records which that contains one more more words stored in all uppercase letters. From the above 6 records, I want to select only below 1,3,6 records:
rec_id remarks_desc
__________________________________________________
1 a specific PROGRAMMING problem (it contains one all uppercase word PROGRAMMING)
3 software tools commonly USED by programmers (it contains one all uppercase word USED)
6 to see if your QUESTION has been asked BEFORE (it contains two all uppercase words QUESTION and BEFORE)
I tried to archive this using LIKE, REGEXP but getting incorrect result.
Please help me to get the correct result.
Try:
SELECT rec_id, remarks_desc FROM order_det WHERE remarks_desc REGEXP '(^|[[:blank:]])[[:upper:]][[:upper:]]+([[:blank:]]|$)'
I have assumed that you want to exclude single-letter capitalised words. If you want to exclude capitalised words at the start of the string, you'll need to tweak the regex.
Make sure that your table collation is case sensitive (_cs not _ci)
I used information from http://dev.mysql.com/doc/refman/5.1/en/regexp.html#operator_regexp
However, if you're having to use regular expressions to extract data from a database, it's worth considering whether your database design could be improved.
This is particularly important if you need good performance from the database.
Here is the pretty straight forward stored function which returns amount of words in uppercase in row.
Cons:
it's stored function not pure SQL;
it uses collate
it uses regexp, but you can fill free to get rid of it using another inner loop for it;
it counts all words but you can add break if you reach 2.
Please find the function on the following link (gist.github.com). It doesn't display correctly here.

How does SQL Server full-text Index actually index the words in a catalog?

I have been searching long and hard for an answer regarding the processes full-text index uses to index the full text catalogs assigned to a document, where the document primary key is included in the indexing. I have not been able to find the MSDN article that describes this in depth.
Why can't I use it for searching int-only strings in the full text search columns?
SEE HERE: (WELL, I'm a new user so I remade the columns myself since I can't post an image)
ID FIRSTNAME LASTNAME ADDRESS FULLTEXTCOLUMN
1 JOHN DOE 123 Main St. 1 JOHN DOE 123 Main St.
2 JANE DOE 124 Summer St. 2 JANE DOE 124 Summer St.
^ ----------^ --can't search
For example, in this link, the author of the post shows that he has included the primary key int-only indexes in his full text-indexed column --- but why? After trying a CONTAINS() search on the int values for myself, the search can't find anything without text attached.
So why do so many people show integer-only strings in their catalog if they aren't searchable? I have a huge need for integer-search options in my catalog, and hope I'm just missing something.
Does this mean that the indexes are only assigned to strings that contain at least one letter?
This question here describes a catalog format very close to what I'm trying to cheat the system to do in SQL Server (because this is my only database option).
Thanks for your help!
And yes, this is my closed question here but I don't care because it is a "real" question and important one. I have a team of people behind me wondering this.
After a lot of testing, (and turning off the StopList, which prevented indexing the integer-only strings --which the author of the article linked DID do) it looks like it indexes every string delimited by a space, including the integers. I believe the words that are actually indexed go in the catalog using information about the row, column, and sort of cell within that column, or position, rather.
Everything in that entire table gets a relational index, as long as it is space-delimited.

Is MySQL FULLTEXT best solution for partial words?

I have a MySQL MyISAM table containing entries that describe airports. This table contains 3 varchar columns - code, name and tags.
code refers to the airport's code (like JFK and ORD), the name refers to the airport's name (John F Kennedy and O'Hare) and tags specify a semicolon separated list of tags that are associated with the airport (like N.Y.C;New York; and Chicago;).
I need to be able to lookup an airport (for an autocomplete) by either the code, name or tags, therefore I set a FULLTEXT index on (code, name, tags).
I have encountered two problems with FULLTEXT so far that prevent me from working with it:
1. There is no way to do partial matching - only postfix matching (is this true?)
2. When a period ('.') is specified in the term to match against, the matching works differently. I am assuming that the period is being parsed in a special way. For example, doing a FULLTEXT search on N.Y.C will not return JFK, although doing the same search on New York will
Is there anyway to overcome these barriers? Otherwise, should I be looking at like matching instead, or an entirely different storage engine? Thanks!
Best solution I came up with is using both FULLTEXT and like matching, and using UNION for the results.

mysql - extract specific words from text field using full text search

My question is a little simillar to Extract specific words from text field in mysql, but now the same.
I have a text field with words inside. In my language word can have many different endings. I need to find this endings.
I use fulltext search of mysql, but I would need to have access to the index database where all the field is "cut" to words and words are counted. I could then search for "test*" and I could quickly find "test", "tested", "testing". I need the list of all endigns that exist in my database, that is my primary goal.
As it is I can get the records with specific "test*" words in it, but I need not only to locate the occurence in the field, but to group somehow so I get the list of all the words that for example start with "test". I don't need location in which record they are, just a list, grouped so that "testing" is not written 10 times but only once (maybe a counter of how many times it is found but not necessary).
Is there a way to extract this info from fulltextsearch field or should I explode all this fields to words and make a index table full of words and just do a "like "word%" and group by the different results? I am not sure how to do that either in practice, but just to point me to the right direction please.
So to summarize: I have a text fied and I need to find out which words are inside that start with "test", like "tested", "test", "testing" etc... It doesn't make sense in English but in my language it does as we have same word on different endigns and there are so many of them, somethimes 20, I need to find out which ones are there so I can make a synonims table ;-)
UPDATE:
Database has columns ID (int), ingredients (text) and recipe (text).
Data in ingredients are cooking ingredients with different endings like:
1 egg
2 eggs
etc.
You can dump all words that are present in an index. And that would also show frequency of each word. E.g. test is used 200 times and testing is used 300 times.
Manual for that: http://dev.mysql.com/doc/refman/5.0/en/myisam-ftdump.html