On our new site (a shopping site), we will use Solr for our site's search engine. In the Solr index we keep a list of product id's, and a list of keywords for each product. The search query is done against the keywords.
Solr returns a list of product id's. These id's are then inserted into a MySQL query to select all product data from the database. MySQL also handles the sorting of results. E.g., the MySQL query might look like:
SELECT * FROM product WHERE id IN (1,4,42,32,46,...,39482) ORDER BY price ASC
We have around 100,000 products on the site. This method works fine when there are a couple of thousand of results, but becomes slow when there are - for example - 50,000 results.
My assumption is that the bottleneck is the "WHERE IN" clause. A long-term solution will be to move all product data to Solr so it can handle sorting the results and also applying refine filters to the search (e.g., perhaps the user only wants to view products in a certain price range). However, we are inexperienced with Solr and need a short-term fix before we can implement this.
One option is to abandon Solr in the short-term and store keywords in a table in MySQL and do the search against this using a FULL-TEXT search.
Am I missing any other options?
The main problem for you is that Solr is going to return the results sorted by number of matching keywords, but you want the results to be sorted by price. Like you correctly mention, moving all your data to Solr is the best option - you would be very happy with Solr for your searching, sorting, faceting and pagination needs.
For the short-term however, it will be well worth to just add the price field to Solr. When you get a search query like tooth paste you can issue a Solr query like
q=keywords:(tooth AND paste)&rows=10&fl=id&sort=price%20asc
to get only the first 10 results and then do pagination by specifying the start parameter, so like:
q=keywords:(tooth AND paste)&rows=10&start=10&fl=id&sort=price%20asc
Related
I have a usecase wherein I need to maintain ordered data against a particular field either in mongodb or mysql. Is is possible to do that? I can always retrieve data in order by using an orderby clause but I need to save the rows in ordered format in database, something like inserting into a sorted tree. Performance of database is not a concern.
I have a usecase wherein I need to maintain ordered data against a particular field either in mongodb or mysql. Is is possible to do that? I can always retrieve data in order by using an orderby clause but I need to save the rows in ordered format in database, something like inserting into a sorted tree.
Up to this point, I had imagined you wanted the data to be stored in the desired order for performance reasons—but then you go on to say...
Performance of database is not a concern.
Which leaves me completely stumped as to what you're trying to accomplish or why.
SQL tables are not "ordered". SQL indexes (which are often implemented as "sorted trees") are. One can build multiple different indexes over the same table. The RDBMS will then use such indexes to improve the performance of queries upon the table, however the underlying data storage is unaffected.
So, the answer to your question as literally posed is "no"—however, all of the benefits (i.e. performance, which apparently is not a concern to you) can be derived through the use of indexes.
I have three to five search fields in my application and planning to integrate this with Apache Solr. I tried to do the sams with a single table and is working fine. Here are my questions.
Can we create index multiple tables in same core ? Or should i create separate core for each indexes (i guess this concept is wrong).
Suppose i have 4 tables users, careers, education and location. I have two search boxes in a php page where one is to search for simple locations (just like an autocomplete box) and another one is to get search for a keyword which should check on tables careers and education. If multiple indexes are possible under single core;
2.1 How do we define the query here ?
2.2 Can we specify index name in query (like table name in mysql) ?
Links which can answer my concerns are enough.
If you're expecting to query the same data as part of the same request, such as auto-completing users, educations and locations at the same time, indexing them to the same core is probably what you want.
The term "core" is probably identical to the term "index" in your usage, and having multiple sets of data in the same index will usually be achieved through having a field that indicates the type of document (and then applying a filter query if you want to get documents of only one type, such as fq=type:location. You can use the grouping feature of Solr to get separate result sets of documents back for each query as well.
If you're only ever going to query the data separately, having them in separate indexes are probably the way to go, as you'll be able to scale and perform analysis and tuning independent from each index in that case (and avoid having to always have a filter query to get the type of content you're looking for).
Specifying the index name is the same as specifying the core, and is part of the URL to Solr: http://localhost:8983/solr/index1/ or http://localhost:8983/solr/index2/.
There is a page containing diverse items from different MySQL tables (news, articles, video, audio, ...), binded to a certain tag (e.g. "economics").
At the moment, from each table 100 rows binded to the tag are fetched and then grouped and sorted.
I need to introduce a pagination to the page, which is pain in such a situation, because one needs collect all items together in order to get the chunk from some offset with some limit length.
I think I need aggregate items from each table in one data source, and then perform querying (filter by tag) and sort (by date) on it.
What can I use for this purpose? I consider Sphinx search engine, but I'm not sure whether it's good in this case or not - I need only querying and sorting, not full-text search.
Sphinx is very good solution for your case. You can define one index for all types of your content (news, articles, video, audio), just add field "source_type" which show source table, for example 1 - news, 2 - audio, 3 - video, etc. And add all fields which you want use for filtration.
If you want to search all audio with tag "rock", you just need do filter by "tag" and "source_type" fields. Sphinx do it much more faster than MySQL, particularly if you have very big amount of data. Sphinx will return you only bunch of founded rows (it depends on max_results in sphinx config).
At the same time sphinx easily return you count off all matches very fast. Using LIMIT and OFFSET in your queries to Sphinx you can do pagination.
In that manner you can fetch ids of objects in MySQL db from Sphinx and after that fetch all required data from MySQL.
I used that scenario in the same situation. And it provide great efficiency.
I have a very small MySQL database that stores information about goods and users. I am trying to implement search among users, who bought some goods by firstname and lastname. Sphinx search engine has lot of good recommendations. So I am using it. Now my search looks like following:
Search with Sphinx IDs of users according to firstname and lastname.
Search in MySQL (not with Sphinx) goods according to specific
filters (id or category, price, etc.) where user_id IN IDs from
item1.
How to implement this with one JOIN query?
You can't directly, because as you say the sphinx index, and the database live within different 'systems'.
So the 'join' is happening in your application. Sounds like you are already implemening what is effectivly a join.
But there are two alternatives if you really dont want to continue with that system,
1) SphinxSE. Its a fake mysql storage engine, when you make a query against the virtual table, a query is made in the background back to sphinx index, and the results of the query are presented as a table, to mysql. Now because its a mysql table, mysql then join it with the database table(s) to present resultset, combinging the query and the data. (there is still seperate systems, but mysql implements the joining logic)
2) Attributes. Can store data in the sphinx index, alongside the full-text index. Sphinx can return the attributes in result sets. In this way you avoid the need for the join, because you get the search results along with the data (which you would of got from mysql) in one go.
(in this way you create one big 'normalized' index)
I think this is a long-shot....
My database has the following fields: title, description, date, price, hash
At present I generate an MD5 hash like so md5($title.$desc.$date.$price) and place it in the hash field for each item, so that when a new item is added to the database I have an easy and fairly reliable way of knowing whether an item with the same details already exists in the database.
What I would like to do is expand this, so the match process is a little more fuzzy. The reason for this is that I'm seeing lots of duplicate items in the database where the description may be only one or two characters different, or the price might be slightly different.
The database is large (3mill rows) and is INNODB. I also have Sphinx at my disposal if this offers a way of filtering out similar results when they are returned from searches.
Well Sphinx (or other 'search engine') would need a similar 'hash' computing to be able to remove duplicates at query time.
Where sphinx might help you, when you insert an item into the database, use sphinx to run a search on the database for similar items. You could get a 'ranked' list of potential duplicates. If the top item has a high score, you could say its suffiently similar, and then store that fact in the databse.
(How I do it, is have a second column on the table called 'grouper', by default it just duplicates the primary key of the item. But if a duplicate is found, instead its changed to the PK of the item it duplicates. Can then just run a mysql (or sphinx!) GROUP BY on that grouper column)
You could use SOUNDEX on the description (used to cope with slightly different spellings of words).
http://dev.mysql.com/doc/refman/5.6/en/string-functions.html#function_soundex
For the price, if you round it to the nearest 10 (or whatever is reasonable) before creating the MD5 that should cope with small differences.