Fulltext search on many tables - mysql

I have three tables, all of which have a column with a fulltext index. The user will enter search terms into a single text box, and then all three tables will be searched.
This is better explained with an example:
documents
doc_id
name FULLTEXT
table2
id
doc_id
a_field FULLTEXT
table3
id
doc_id
another_field FULLTEXT
(I realise this looks stupid but that's because I've removed all the other fields and tables to simplify it).
So basically I want to do a fulltext search on name, a_field and another_field, and then show the results as a list of documents, preferably with what caused that document to be found, e.g. if another_field matched, I would display what another_field is.
I began working on a system whereby three fulltext search queries are performed and the results inserted into a table with a structure like:
search_results
table_name
row_id
score
(This could later be made to cache results for a few days with e.g. a hash of the search terms).
This idea has two problems. The first is that the same document can be in the search results up to three times with different scores. Instead of that, if the search term is matched in two tables, it should have one result, but a higher score.
The second is that parsing the results is difficult. I want to display a list of documents, but I don't immediately know the doc_id without a join of some kind; however the table to join to is dependant on the table_name column, and I'm not sure how to accomplish that.
Wanting to search multiple related tables like this must be a common thing, so I guess what I'm asking is am I approaching this in the right way? Can someone tell me the best way of doing it please.

I would create a denormalized single index. Ie, put all three document types into a single table with fields for doc_id, doc_type and a single fulltext block. Then you can search all three document types at once.
You might also find that Lucene would make sense in this situation. It gives you faster searching, as well as much more functionality around how the searching and scoring works.
The downside is that you're keeping a separate denomalized copy of the text for each record. The upside is that searching is much faster.

Related

Get mySQL full text match score for strings not in the table (optimally in a mixed result set with matches from the table)?

This must be a niche scenario since I have not been able to find a similar question around and in my brief testing in my SQL workbench just using the string in place of the column name did not work.
eg:
SELECT MATCH ('fork') AGAINST ('user entered text about forks' IN NATURAL LANGUAGE MODE);
Doesn't work...
I have a query that returns matches on a full text index with the relevance score as one of the columns returned. In this app, I am looking for "search suggestions" in a suggestions table that is built off the websites search index content. The user side also stores everything they search for in their local browser storage.
Currently, I have front end code that uses regex to pull matches from their local storage search history (up to 5) and then sends what they typed (as they type) to the back end to get the best matches from the suggestions table.
The way it works now, is the (up to 5) history matches are shown first, then the rest are filled in up to 10 total matches from the back end. What I would prefer, is that I send the history matches to the back end and include them in the FT match query in some way so that the result set contains all matched suggestions from the table + the history matches sent from the front end, but all sorted by the full text match relevance score to get them all in order of relevance. The new way may result in no history matches showing or it might result in more than 5 history matches showing, it would all boil down the releveance score.
Is something like this possible? The only other way I could image doing this is somehow creating a temporary table with a full text index, on the fly, and then joining that table in my current query, then removing the temp table when its done. The problem with that, in my mind, is that this is all happening in real time as the user types so I don't want to add something like that if its going to bog down the response time. Is there a fast/optimal way of doing this? Is there a way that would also remove the temporary table when the query ends?
Or is there some other command that can just give me a score based on string value against what the user typed in like what I tried above?
EDIT:
It looks like my temporary table idea could work:
https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html
I'll just have to see what kind of perforamce impact this has. Im still interested to hear thoughts on if this is the best / only way or if there is a better one.
The CREATE TEMPORARY TABLE route was the way to go here. I tested it out and its working.
Worthy of note to future travelers. I had to switch my main table from innodb to myisam for this to work. I was able to mix/match the myisam temp table with the innodb main table, but the scoring algorithms are different so the innodb matches were taking priority due to higher scores. This was not an issue for me as I really did not need / use transactions for the primary suggestions table so I just made them both MyISAM engines.
Another item of note, is that I had to switch to splitting the user's query into "words" and ecapsulating them in "*" and running the match as a boolean search instead of natural language becausae in the case of the temp table, a user would likely have entered similar searches which would mean most of the words were in more than 50% of the rows so no matches were returning. Boolean search works around this. Again, not a big deal for my particular use case.
Had I needed to stay in innodb for this, it would have been a problem because from what I can tell, there is no way to set a full text index on an innodb temporary table.

How to search inside a SQL table for a phrase

I am currently using MySQL but I am willing to migrate if necessary to any solution suggested.
I am looking for an easy way to implement a search on a table.
The table has multiple entries with data similar to what will be found on user accounts, like names, addresses, phone numbers and a text column that contains comments of arbitrary length.
I want to make a search so that I can go over all rows and columns and find the best matching row. Slightly misspells corrected (Not very important). But most important is the ability to cross search everything.
Table can have as many as 20,000 rows.
Search parameter will be for example: "Company First Name"
Expected results:
company|Contact First Name|Address|...|...
example 2, slightly misspelled search parameters : "Pinaple Street Compani"
Expected results row:
company|pinapple street|..|...
companie|pinapple street|..|...
company|pinaple street|..|...
EDIT:
Forgot to clarify that multiple searches will be done at the same time so it has to be fast (Around 100 searches at the same time). Also the language of the data is not english and the database is utf8 with support for non-english characters
The misspelling problem is hard, if not impossible, to solve well in pure MySQL.
The multiple-column FULLTEXT search isn't so bad.
Your query will look something like this ...
SELECT column, column
FROM table
WHERE MATCH(Company, FirstName, LastName, whatever, whatever)
AGAINST('search terms' IN NATURAL LANGUAGE MODE)
It will produce a bunch of results, ordered by what MySQL guesses is the most likely hit first. MySQL's guesses aren't great, but they're usually adequate.
You'll need a FULLTEXT index matching the list of columns in your MATCH() clause. Creating that index looks like this.
ALTER TABLE book
ADD FULLTEXT INDEX Fulltext_search_index_1
(Company, FirstName, LastName, whatever, whatever);
Comments in your question notwithstanding, you just need an index for the group of columns which you will search.
20K rows won't be a big burden on any recent-vintage server hardware.
Misspelling: You could try SOUNDEX(), but it's an early 20th century algorithm designed by the Bell System to look up peoples' names in American English. It's designed to get many false positive hits, and it really is dumber than a bucket of rocks.
If you really do need spell correction you may need to investigate Sphinx.

Best MySQL search query for multiple keywords in multiple columns

The problem here is that i have multiple columns:
| artist | name | lyrics | content
I want to search in these columns by multiple keywords. The problem is that i can't make any good algorithm with LIKE or/and.
The best possibility is to search for each keyword in each column, but in that way i will get result that may contain the keyword in the name but will not contain the second keyword of artist.
I want everything to be with AND, but this way, It will work for the keywords if there is only one column that i'm searching about. In other way, to receive a result, every of the column must have all keywords...
Is there any possibility someone to know what algorithm i have to create, that when you search with 3 keywords (ex: 1 for artist and 2 for name) to find the correct result?
The best solution is not to use MySQL for the search, but use a text-indexing tool like Apache Solr.
You could then run queries against Solr like:
name:"word" AND artist:"otherword"
It's pretty common to use Solr for indexing data even if you keep the original data in MySQL.
Not only would it give you the query flexibility you want, but it would run hundreds of times faster than using LIKE '%word%' queries on MySQL.
Another alternative is to use MySQL's builtin fulltext indexing feature:
CREATE FULLTEXT INDEX myft ON mytable (artist, name, lyrics, content);
SELECT * FROM mytable
WHERE MATCH(artist, name, lyrics, content)
AGAINST ('+word +otherword' IN BOOLEAN MODE)
But it's not as flexible if you want to search for words that occur in specific columns, unless you create a separate index on each column.
AND works for displaying multiple rows too. it just depends upon the rows you have in your table which you havent provided. PS, im sorry if my answer is not clear, i dont have the reputation to make it a comment

Aggregate most relevant results with MySQL's fulltext search across many tables

I am running fulltext queries on multiple tables on MySQL 5.5.22. The application uses innodb tables, so I have created a few MyISAM tables specifically for fulltext searches.
For example, some of my tables look like
account_search
===========
id
account_id
name
description
hobbies
interests
product_search
===========
id
product_id
name
type
description
reviews
As these tables are solely for fulltext search, they are denormalized. Data can come from multiple tables and are agregated into the search table. Besides the ID columns, the rest of the columns are assigned to 1 fulltext index.
To work around the "50%" rule with fulltext searches, I am using IN BOOLEAN MODE.
So for the above, I would run:
SELECT *, MATCH(name, type, description, reviews) AGAINST('john') as relevance
FROM product_search
WHERE MATCH(name, type, description, reviews) AGAINST('john*' IN BOOLEAN MODE) LIMIT 10
SELECT *, MATCH(name, description, hobbies, interests) AGAINST('john') as relevance
FROM account_search
WHERE MATCH(name, description, hobbies, interests) AGAINST('john*' IN BOOLEAN MODE) LIMIT 10
Let's just assume that we have products called "john" as well :P
The problem I am facing are:
To get meaningful relevance, I need to use a search without IN BOOLEAN MODE. This means that the search is subjected to the 50% rule and word length rules. So, quite often, if I most of the products in the product_search table is called john, their relevance would be returned as 0.
Relevances between multiple queries are not comparable. (I think a relevance of 14 from one query does not equal a relevance of 14 from another different query).
Searches will not be just limited to these 2 tables, there are other "object types", for example: "orders", "transactions", etc.
I would like to be able to return the top 7 most relevant results of ALL object types given a set of keywords (1 search box returns results for ALL objects).
Given the above, what are some algorithms or perhaps even better ideas for get the top 7?
I know I can use things like solr and elasticsearch, I have already tried them and am in the proces of integrating them into the application, but I would like to be able to provide search for those who only have access to MySQL.
So after thinking about this for a while, I decided that the relevance ranking has to be done with 1 query within MySQL.
This is because:
Relevance between seperate queries can't be compared.
It's hard to combine the contents of multiple searches together in meaningful ways.
I have switched to using 1 index table dedicated to search. Entries are inserted, removed, and updates depending on inserts, removals and updates to the real underlying data in the innodb tables (this is all automatic).
The table looks like this:
search
==============
id //id for the entry
type //the table the data came from
column //column the data came from
type_id //id of the row the in the original table
content //text
There's a full text index on the content column. It is important to realize that not all columns from all tables will be indexed, only things that I deem to be useful in search has been added.
Thus, it's just a simple case of running a query to match on content, retrieve what we have and do further processing. To process the final result, a few more queries would be required to ask the parent table for the title of the search result and perhaps some other meta data, but this is a workable solution.
I don't think this approach will really scale (updates and inserts will need to update this table as well), but I think it is a pretty good way to provide decent application wide search for smaller deployments of the application.
For scalability, use something like elastic search, solr or lucene.

Correct indexing when using OR operator

I have a query like this:
SELECT fields FROM table
WHERE field1='something' OR field2='something'
OR field3='something' OR field4='something'
What would be the correct way to index such a table for this query?
A query like this takes a entire second to run! I have 1 index with all 4 of those fields in it, so I'd think mysql would do something like this:
Go through each row in the index thinking this:
Is field1 something? How about field2? field3? field4? Ok, nope, go to the next row.
You misunderstand how indexes work.
Think of a telephone book (the equivalent of a two-column index on last name first, first name last). If I ask you to find all people in the telephone book whose last name is "Smith," you can benefit from the fact that the names are ordered that way; you can assume that the Smiths are organized together. But if I ask you to find all the people whose first name is "John" you get no benefit from the index. Johns can have any last name, and so they are scattered throughout the book and you end up having to search the hard way, from cover to cover.
Now if I ask you to find all people whose last name is "Smith" OR whose first name is "John", you can find the Smiths easily as before, but that doesn't help you at all to find the Johns. They're still scattered throughout the book and you have to search for them the hard way.
It's the same with multi-column indexes in SQL. The index is sorted by the first column, then sorted by the second column in cases of ties in the first column, then sorted by the third column in cases of ties in both the first two columns, etc. It is not sorted by all columns simultaneously. So your multi-column index doesn't help to make your search terms more efficient, except for the left-most column in the index.
Back to your original question.
What would be the correct way to index such a table for this query?
Create a separate, single-column index on each column. One of these indexes will be a better choice than the others, based on MySQL's estimation of how many I/O operations the index will incur if it is used.
Modern versions of MySQL also have some smarts about index merging, so the query may use more than one index in a given table, and then try to merge the results. Otherwise MySQL tends to be limited to use one index per table in a given query.
Another trick that a lot of people use successfully is to do a separate query for each of your indexed columns (which should use the respective index) and then UNION the results.
SELECT fields FROM table WHERE field1='something'
UNION
SELECT fields FROM table WHERE field2='something'
UNION
SELECT fields FROM table WHERE field3='something'
UNION
SELECT fields FROM table WHERE field4='something'
One final observation: if you find yourself searching for the same 'something' across four fields, you should reconsider if all four fields are actually the same thing, and you're guilty of designing a table that violates First Normal form with repeating groups. If so, perhaps field1 through field4 belong in a single column in a child table. Then it becomes a lot easier to index and query:
SELECT fields from table INNER JOIN child_table ON table.pk = child_table.fk
WHERE child_table.field = 'something'
In addition to previous comment:
Some RDMS like Mysql/PostgreSql can use index merge if optimizer thinks that it's good idea.
So you can create different indexes for each field or create some composite indexes like field1,field2 and field3,field4. Finally, you should try several different solutions and choose with best explain plan.