Search function - mysql

I am building a simple search facility.
The idea is that it will search the fields of code, title, description and category.
It's quite simple to search for the category and code as it's just one word (%code%).
However, I am unsure how I would break down the title and description to search for any keywords the user enters?
Does anyone have any good techniques for this?
Thanks.

Given the little amount of info:
If you're using the MyISAM storage from MySQL you can enable FULLTEXT indexes and use a FULLTEXT search over that; see this link for more information on that.
If you're, however, using InnoDB (which I'm also using on my databases), you can't directly enable it in MySQL.
You have a few options; either you split up the keywords yourself and search for entries matching one or more of those keywords and check afterwards how many keywords matched for the ordering. You can also include that in the query, but then you'd need to make a query for each keyword and combine those results with a parent query.
Another option, which is the option I finally chose because of the performance and flexibility, is to use a SOLR server and use the php solr_client (see the php manual on it). The SOLR server will index the database given a few (fairly simple) configuration files and allow fulltext searches on any indexed field. More info about setting up a SOLR server can be found in the manual for SOLR: tutorial.
There are, ofcourse, many many other methods and tools. The above are just a few that I've used in the past or am still using (I'm really happy using solr, but that's something personal, I guess).
Good luck.

What you want is not something MySQL does very well. Yhn mentioned some options.
MySQL's FULLTEXT indexes are not popular for good reasons.
Breaking your texts down to keywords and forming indexed tables of them that link back to the original items can work. But doing that, in essence, is like starting to build your own search engine.
Much better search engines than you are likely to build are available. Yhn mentioned SOLR, which is very good, but I want to mention also Sphinx Search, which I use. SOLR has some interesting features that Sphinx doesn't have, but I had the impression Sphinx is easier to learn and get started with. It's worth your consideration.

Related

Proper way to implement near-match searching MySQL

I have a table on a MySQL database that has two (relevant) columns, 'id' and 'username'.
I have read that MySQL and relational databases in general are not optimal for searching for near matches on strings, so I wonder, what is the industry practice for implementing simple, but not exact match, search functionalities- for example when one searches for accounts by name on Facebook and non-exact matches are shown? I found Apache Lucene when researching this, but this seems to be used for indexing pages of a website, not necessarily arbitrary strings in a database table.
Is there an external tool for this use case? It seems like any SQL query for this task would require a full scan, even if it was simply looking for the inclusion of a substring.
In your situation I would recommend for you to use Elasticsearch instead of relational database. This search engine is a powerful tool for implementing search and analytics functionality.
Elasticsearch also flexible and versatile, with a rich query language using JSON as query language and support for many different types of data.
And of course supports near-match searching. As you said, MySQL and anothers relational databases aren't recommended to use near-match searching, they aren't for this purpose.
--------------UPDATE------------
If you want to use full-text-search using a relational database It's possile but you might have problem to scale if your numbers of users increase a lot. Keep in mind that ElasticSearch is robust and powerfull, so, you can do a lot of types of searches so easily in this search engine, but it can be more expensive too.
When I propose to you use ElasticSearch I'm thinking about the scaling the search. But I've thinking in your problem since I answered and I've understood that you only need a simple full-text-search. For conclude, in the begginning you can use only relational database to do that, but in the future you might move your search to ElasticSearch or if your search became complex.
Follow this guide to do full-text search in Postgresql. http://rachbelaid.com/postgres-full-text-search-is-good-enough/
There's another example in MySql: https://sjhannah.com/blog/2014/11/03/using-soundex-and-mysql-full-text-search-for-fuzzy-matching/
Like I said in the comments, It's a trade-off you must to do. You can prefer to use ElasticSearch in the beginning or you can choose another database and move to ElasticSearch in the future.
I also recommend this book to you: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. Actually I'm reading this book and it would help you to understand this topic.
--------------UPDATE------------
To implement near-match searching in ElasticSearch you can use fuzzy matching query. The fuzzy matching query allows you to controls how lenient the matching should be, for example for this query bellow:
{
"query": {
"fuzzy": {
"username": {
"value": "julienambrosio",
"fuzziness": 2
}
}
}
}
They'll return "julienambrosio", such as "julienambrosio1", "julienambrosio12" or "juliembrosio".
You can adjust the level of fuzziness to control how lenient/strict the matching should be.
Before you create this example you should to study more about ElasticSearch. There're a lot of courses in udemy, youtube and etc.
You can read more about in the official docs.

MySQL InnoDB Text Search Options

Knowing full well that my InnoDB tables don't support FULLTEXT searches, I'm wondering what my alternatives are for searching text in tables ? Is the performance that bad when using LIKE ?
I see a lot of suggestions saying to make a copy of the InnoDB table in question in a MYISAM table, and then run queries against THAT table and match keys between the two and I just don't know that that's a pretty solution.
I'm not opposed to using some 3rd party solution, I'm not a huge fan of that though. I'd like to explore more of what MySQL can do on its own.
Thoughts ?
If you want to do it right you probably should go with Lucene or Sphinx from the very start.
it will allow you to keep your table structure.
you'll have a huge performance boost (think ahead)
you'll get access to a lot of fancy search functions
Both Lucene and Sphinx scale amazingly well (Lucene powers Wikipedia and Digg / Sphinx powers Slashdot)
Using LIKE can only use an index when there is no leading %. It will be a huge performance hit to do LIKE '%foo%' on a large table. If I were you, I'd look into using sphinx. It has the ability to build its index by slurping data out of MySQL using a query that you provide. It's pretty straightforward and was designed to solve your exact problem.
There's also solr which is an http wrapper around lucene, but I find sphinx to be a little more straightforward.
I as others have i would urge use of Lucene, Sphinx or Solr.
However if these are out and your requirements are simple I've used the steps here to build simple search capability on a number projects in the past.
That link is for Symfony/PHP but you can apply the concepts to any language and application structure assuming there is an implementation of a stemming algorithm available. However, if you dont use a data access pattern where you can hook in to update the index when a record is updated its not as easily doable.
Also a couple downsides are that if you want a single index table but need to index multiple tables you either have to emulate referential integrity in your DAL, or add a fk column for each different table you want to index. Im not sure what youre trying to do so that may rule it out entirely.

Search Short Fields Using Solr, Etc. or Use Straight-Forward DB Index

My website stores several million entities. Visitors search for entities by typing words contained only in the titles. The titles are at most 100 characters long.
This is not a case of classic document search, where users search inside large blobs.
The fields are very short. Also, the main issue here is performance (and not relevance) seeing as entities are provided "as you type" (auto-suggested).
What would be the smarter route?
Create a MySql table [word, entity_id], have 'word' indexed, and then query using
select entity_id from search_index where word like '[query_word]%
This obviously requires me to break down each title to its words and add a row for each word.
Use Solr or some similar search engine, which from my reading are more oriented towards full text search.
Also, how will this affect me if I'd like to introduce spelling suggestions in the future.
Thank you!
Pro's of a Database Only Solution:
Less set up and maintenance (you already have a database)
If you want to JOIN your search results with other data or otherwise manipulate them you will be able to do so natively in the database
There will be no time lag (if you periodically sync Solr with your database) or maintenance procedure (if you opt to add/update entries in Solr in real time everywhere you insert them into the database)
Pro's of a Solr Solution:
Performance: Solr handles caching and is fast out of the box
Spell check - If you are planning on doing spell check type stuff Solr handles this natively
Set up and tuning of Solr isn't very painful, although it helps if you are familiar with Java application servers
Although you seem to have simple requirements, I think you are getting at having some kind of logic around search for words; Solr does this very well
You may also want to consider future requirements (what if your documents end up having more than just a title field and you want to assign some kind of relevancy? What if you decide to allow people to search the body text of these entities and/or you want to index other document types like MS Word? What if you want to facet search results? Solr is good at all of these).
I am not sure if you would need to create an entry for every word in your database, vs. just '%[query_word]%' search if you are going to create records with each word anyway. It may be simpler to just go with a database for starters, since the requirements seem pretty simple. It should be fairly easy to scale the database performance.
I can tell you we use Solr on site and we love the performance and we use it for even very simple lookups. However, one thing we are missing is a way to combine Solr data with database data. And there is extra maintenance. At the end of the day there is not an easy answer.

What are the pros and cons of using full-text search capabilities of MySQL

First of all: I have fruitlessly tried searching Stackoverflow.com for any clues on my problem, however if I have missed anything, please let me know!
In my database I have a table containing metadata (i.e. description etcetera) and some other information (file names, etc) about a number of files.
I'd like to provide the users with the ability search for files among these which matches a search query on both filename and description.
What would be the best solution for this problem, should I use the full-text search functions of MySQL or is there a better way of solving this problem? Any performance issues to take into consideration?
Thanks in advance!
MySQLs full text search makes your searches more reliable and flexiable. It's a "natural language search", meaning you can build up rules that the search should adapt to, i.e. order/word boundry/and so on. More over the full text search is adaptable, as an example: Full text query expansion, learns "synonyms" during the search.
Some cons are that it's performance heavy to insert large datasets in a table with full text index and that it only can be used on MyISAM tables.
In your case I would absolutely consider using the full text search, especially for the descriptive column in the table.
Beside the performance and scalability issues, the main drawback is to use full text search in MySQL, you must use MyISAM as the storage engine, which is quite rare in most business nowadays.
Anyway, I would strongly recommend you to try out Sphinx: http://www.sphinxsearch.com/. It is very easy to setup and integrate with MySQL. Prominent users of Sphinx include Craigslist and MySQL.

Efficient Filtering / Searching

We have a hosted application that manages pages of content. Each page can have a number of customized fields, and some standard fields (timestamp, user name, user email, etc).
With potentially hundreds of different sites using the system -- what is an efficient way to handle filtering/searching? Picture a grid view that you want to narrow down. You can filter on specific fields (userid, date) or you can enter a full-text search.
For example, "all pages started by userid 10" would be a pretty quick query against a MySQL database. But things like "all pages started by a user whose userid is 10 and matches [some search query]" would suck against the database, so it's suited for a search engine like Lucene.
Basically I'm wondering how other large sites do this sort of thing. Do they utilize a search engine 100% for all types of filtering? Do they mix database queries with a search engine?
If we use only a search engine, there's a problem with the delay time it takes for a new/updated object to appear in the search index. That is, I've read that it's not smart to update the index immediately, and to do it in batches instead. Even if this means every 5 minutes, users will get confused when their recently added page isn't immediately listed when they view a simple page listing (say a search query of "category:5").
We are using MySQL and have been looking closely at Lucene for searching. Is there some other technology I don't know about?
My thought is to offer a simple filtering page which uses MySQL to filter on basic fields. Then offer a separate fulltext search page that would present results similar to Google. Is this the only way?
Solr or grassyknoll both provide slightly more abstract interfaces to Lucene.
That said: Yes. If you are a primarily content driven site, providing fulltext searching over your data, there is something in play beyond LIKE. While MySql's FULLTEXT indexies aren't perfect, it might be an acceptable placeholder in the interim.
Assuming you do create a Lucene index, linking Lucene Documents to your relational objects is pretty straightforward, simply add a stored property to the document at index time (this property can be a url, ID, GUID etc.) Then, searching becomes a 2 phase system:
1) Issue query to Lucene indexies (Display simple results like title)
2) Get more detailed information about the object from your relational stores by its key
Since instantiation of Documents is relatively expensive in Lucene, you only want to store fields searched in the Lucene index, as opposed to complete clones of your relational objects.
Don't write-off MySQL so readily!
Implement it using the database e.g. a select with a 'like' in the where-clause or whatever.
Profile it, add indexes if necessary. Roll out a beta, so you get real numbers from user's actual data patterns - not all columns might be equally asked after, etc.
If the performance does suck, then thats when you consider other options. You can consider tuning your SQL, your database, the machine the database is running on, and finally using another technology stack...
In case you want to use MySQL or PostgreSQL, a open source solution that works great with it is Sphinx:
http://www.sphinxsearch.com/
We are having the same problem and considering Sphinx and Lucene as possible solutions.