How to do keyword query in MySQL with big data amount? - mysql

Here is the scenario:
I want to do a keyword query in MySQL with big data amount, which is in 10 million level.
The match is just to judge whether the keyword is a substring of the current specified field.
If there is a string: "A BC DEF", and the keyword is "BC", then it is matched. Just this simple, but I want it to be as quickly as possible. Because this is gonna applied to a website's search module (with relatively high concurrency), I don't want the user to wait for a long time.
Could anyone give me an idea? Thanks a lot!
P.S. I've searched things about fulltext in MySQL, as well as some search engines like Lucene and Sphinx, which one is better and more appropriate to apply? My web project is based on Java EE. Thanks!

Consider using MySQL Full-Text Search Functions
http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html
Then you can use a SQL Query like this:
SELECT * FROM articles
WHERE MATCH (title, body) AGAINST ('BC');

Related

What is the fastest way to search through MySQL database to match strings?

I need to search through around a million records to find out whether an inputted company name is on the database or not, and if it isn't then get suggestions for changing the input depending on what does exist on the database. Presumably FULLTEXT index is the best way to go about this? For example, if the input was 'Some Law Firm LLP' but that is not on the database as a company name, but 'Some Law Firm' is I want 'Some Law Firm' to be returned as a suggestion. Or perhaps 'Some New Law Firm' was on the database I would want that returned.
I have never implemented a FULLTEXT index on a database before so don't really know how to implement one or create the appropriate query to return input change suggestions.
I am also worried about the fact that MySQL does not consider 3 letter strings as words so how would I account for company names like 'BBC'? I know that I can use something like:
$q = "SELECT company_name FROM wfp_contacts2 WHERE MATCH (company_name) AGAINST ('".stripslashes(str_replace(""", "\"", ($query)))."' IN BOOLEAN MODE)";
To test whether there is an exact match on the database, which is fine - but the main point is getting the suggestions. I can't really just use LIKE %'company_name'% because it is far too slow and speed is imperative because there will be up to 700 inputs to be checked for suggestions at once and the LIKE statement with wildcard is far too slow on a million records.
Is there at least any tutorials that anyone can recommend so I can up to speed with FULLTEXT indexing? Also, if there is another way of doing this I would be grateful to hear it as this is extremely important and I honestly don't know what to do about it at the moment.
MySQL Info:
MySQL client version: 5.1.41
Storage Engine: MyISAM
Thanks in advance.
You can go with indexing database/document such as Solr search engine.
Where the search result is faster and you can also implement spelling check which helps in giving suggestion for searching a word.
FYR http://lucene.apache.org/solr/features.html

Improving MySQL's relevance score using Sphinx for full text search

I am working on an information retrieval system using MySQL's with the natural language mode.
The data I have is annotated to considering different categories. Eg. Monkey, cat, dog will be annotated as 'animals' whereas duck, sparrow as 'birds'. The problem is that I am retrieving documents based on the occurrences of these tags.
Now MySQL has a limitation that if a particular term comes in more than 50% in the entire data that term is not considered. Considering my requirement I want it to score all the matching terms even if a particular term comes more than 50% in the entire data.
I have read few things about combination of Sphinx with MySQL for search efficiency but I am not sure whether this could be applied for my situation.
Please provide a solution for this problem
Sphinx is very good at very fast fulltext search. It doesn't have the 50% rule that mySQL has, but you will need to use it in place of mySQL's fulltext search. Basically what you do is install Sphinx and set up an import to copy all your mySQL data into Sphinx. Then you can build SphinxSE or query Sphinx directly through a library to get your results. You can then get the details of your results by querying mySQL.
I use SphinxSE because you can query Sphinx through mySQL and join your mySQL table to the results in a single query. It's quite nice.

Best way to search for partial words in large MySQL dataset

I've looked for this question on stackoverflow, but didn't found a really good answer for it.
I have a MySQL database with a few tables with information about a specific product. When end users use the search function in my application, it should search for all the tables, in specific columns.
Because the joins and many where clauses where not performing really well, I created a stored procedure, which splits all the single words in these tables and columns up, and inserts them in the table. It's a combination of 'word' and 'productID'.
This table contains now over 3.3 million records.
At the moment, I can search pretty quick if I match on the whole word, or the beginning of the word (LIKE 'searchterm%'). This is obvious, because it uses an index right now.
However, my client want to search on partial words (LIKE '%searchterm%'). This isn't performing at all. Also FULLTEXT search isn't option, because it can only search for the beginning of a word, with a wildcard after it.
So what is the best practice for a search function like this?
While more work to set up, using a dedicated fulltext search package like Lucene or Solr may be what you are looking for.
MySQL is not well tailored for text search. Use other software to do that. For example use Sphinx to index data for text search. It will do a great job and is very simple to set up. If you user MySQL 5.1 you could use sphinx as an engine.
There are other servers for performing text search better than Spinx, but they are eather not free or require other software installed.
You can read more about: ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?

User search use MySQL LIKE or integrate a search engine (such as solr, sphinx, lucene etc.)?

In my mysql db I have a user table consisting of 37,000 (or thereabouts) users.
When a user search for another user on the site, I perform a simple like wildcard (i.e. LIKE '{name}%}) to return the users found.
Would it be more efficient and quicker to use a search engine such a solr to do my 'LIKE' searches? furthermore? I believe in solr I can use wildcard queries (http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/)
To be honest, it's not that slow at the moment using a LIKE query however as the number of users grows it'll become slower. Any tips or advice is greatly appreciated.
We had a similar situation about a month ago, our database is roughly around 33k~ and due to the fact our engine was InnoDB we could not utilize the MySQL full-text search feature (that and it being quite blunt).
We decided to implement sphinxsearch (http://www.sphinxsearch.com) and we're really impressed with the results (me becoming quite a 'fanboy' of it).
If we do a large index search with many columns (loads of left joins) of all our rows we actually halved the query response time against the MySQL 'LIKE' counterpart.
Although we havn't used it for long - If you're going to build for future scailablity i'd recommend sphinx.
you can speed up if the searchword must have minimum 3 chars to start the search and index your search column with a index size of 3 chars.
It's actually already built-in to MySQL: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
we're using solr for this purpose, since you can search in 1-2 ms even with milions of documents indexed. we're mirroring our mysql instance with Data Import Handler and then we search on Solr.
as neville pointed out, full text searches are built-in in mysql, but solr performances are way better, since it's born as a full text search engine

MySql Full text or Sphinx or Lucene or anything else?

I am currently using MySql and have a few tables which i need to perform boolean search on. Given the fact my tables are Innodb type, I found out one of the better ways to do this is to use Sphinx or Lucene. I have a doubt in using these, my queries are of the following format,
Select count(*) as cnt, DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00',:zone),'%Y-%m-%d') as dat from t_twitter_tracking wrdTrk where wrdTrk.word like (:word) and wrdTrk.createdOnGMTDate between :stDate and :endDate group by dat;
the queries have a date field which needs to be converted to the timezone of the logged in user and then the field used to do a group by.
Now if i migrate to Sphinx/lucene will I be able to get a result similar to the query above. Am a beginner in Sphinx, which of these two should i use or is there anything better.
Actually groupby and search ' wrdTrk.word like (:word)' is a major part of my query and I need to move to boolean search to enhance user experience. My database has approximately 23652826 rows and the db is Innodb based and MySql full text search doesnt work.
Regards
Roh
Yes. Sphinx can do this. I don't know what LIKE (:word) does, but you can do a query like #word "exactword" in sphinx search.
only you need to index the data properly and will got the result
Since you only need the counts, I believe it would be better for you to keep using MySQL.
If you have a performance problem, I suggest you use explain() and possibly better indexing to improve your queries.
Only if full-text search is a major part of your use-case you should move to using Sphinx/Solr.
Read Full Text Search Engine versus DBMS for a more comprehensive answer.
save your count in a meta table, keep it updated. or use myisam, which maintains its own count. mongodb also maintains its own count. cache the count in memcache. counting each time you need to know the count is a silly use of resources.