I am trying to make a database of products that can be searched by many facets(like newegg or amazon). At first I was going to try to do the whole thing with mysql but further research has led me to believe that is a bad idea so instead I am thinking about using Sphinx.
My question is how would I set up the mysql tables for this? Would I just have one table for the products and another one with all the facets that would just have a couple large varchar fields and foreign key to the product?
I am not a huge Sphinx expert, but I'd say that you don't have to stick all your data in one table. Sphinx can handle associations just fine. If you are planning to use Rails for your front-end then take a look at thinking_sphinx gem. It definitely allows you to specify attributes based on data spread out into many tables. In my experience I didn't have to change my data structure to accommodate Sphinx.
I'll pipe in.
You don't really need to actually. Facets in Sphinx are just ID's (at least in 0.9.9 the current stable release). I am going to assume that you have a standard product table with your different facets stored as foreign keys to other tables.
So assuming you have this you can just select over the main product table and set up the facets in sphinx as per the documentation.
I would really need to see your table structure to comment further. It sounds like you have your products spread over multiple tables. In this case as you mentioned I would go with a single table which you index on which is populated with the contents of all the others.
The great thing about Sphinx is that you can use a MySQL query to get your data into Sphinx. This allows you to structure your database in a way that's optimized for your business logic, without having to worry about how search will perform. As long as you're creative with the query you write for sql_query, you can normalize your database however you'd like, and still be able to grab all the text to be indexed with a single query. For example, if you need to get strings from a many-to-one relationship into your index, you can do so using a subquery.
sql_query = SELECT *, (SELECT pa.text FROM products_attr pa WHERE pa.product_id=p.id ) \
FROM products p;
Additionally, if you drop downs where you search on attribute IDs, you use Sphinx's multi-value attribute. This way, you can search by attribute ID, as well as the text of the attrbute.
sql_attr_multi = uint attributes from query; \
SELECT product_id AS id, id AS attribute FROM product_attributes ;
Related
The problem here is that i have multiple columns:
| artist | name | lyrics | content
I want to search in these columns by multiple keywords. The problem is that i can't make any good algorithm with LIKE or/and.
The best possibility is to search for each keyword in each column, but in that way i will get result that may contain the keyword in the name but will not contain the second keyword of artist.
I want everything to be with AND, but this way, It will work for the keywords if there is only one column that i'm searching about. In other way, to receive a result, every of the column must have all keywords...
Is there any possibility someone to know what algorithm i have to create, that when you search with 3 keywords (ex: 1 for artist and 2 for name) to find the correct result?
The best solution is not to use MySQL for the search, but use a text-indexing tool like Apache Solr.
You could then run queries against Solr like:
name:"word" AND artist:"otherword"
It's pretty common to use Solr for indexing data even if you keep the original data in MySQL.
Not only would it give you the query flexibility you want, but it would run hundreds of times faster than using LIKE '%word%' queries on MySQL.
Another alternative is to use MySQL's builtin fulltext indexing feature:
CREATE FULLTEXT INDEX myft ON mytable (artist, name, lyrics, content);
SELECT * FROM mytable
WHERE MATCH(artist, name, lyrics, content)
AGAINST ('+word +otherword' IN BOOLEAN MODE)
But it's not as flexible if you want to search for words that occur in specific columns, unless you create a separate index on each column.
AND works for displaying multiple rows too. it just depends upon the rows you have in your table which you havent provided. PS, im sorry if my answer is not clear, i dont have the reputation to make it a comment
I have three to five search fields in my application and planning to integrate this with Apache Solr. I tried to do the sams with a single table and is working fine. Here are my questions.
Can we create index multiple tables in same core ? Or should i create separate core for each indexes (i guess this concept is wrong).
Suppose i have 4 tables users, careers, education and location. I have two search boxes in a php page where one is to search for simple locations (just like an autocomplete box) and another one is to get search for a keyword which should check on tables careers and education. If multiple indexes are possible under single core;
2.1 How do we define the query here ?
2.2 Can we specify index name in query (like table name in mysql) ?
Links which can answer my concerns are enough.
If you're expecting to query the same data as part of the same request, such as auto-completing users, educations and locations at the same time, indexing them to the same core is probably what you want.
The term "core" is probably identical to the term "index" in your usage, and having multiple sets of data in the same index will usually be achieved through having a field that indicates the type of document (and then applying a filter query if you want to get documents of only one type, such as fq=type:location. You can use the grouping feature of Solr to get separate result sets of documents back for each query as well.
If you're only ever going to query the data separately, having them in separate indexes are probably the way to go, as you'll be able to scale and perform analysis and tuning independent from each index in that case (and avoid having to always have a filter query to get the type of content you're looking for).
Specifying the index name is the same as specifying the core, and is part of the URL to Solr: http://localhost:8983/solr/index1/ or http://localhost:8983/solr/index2/.
I'll have to create database(s) to store very large amounts of data but being able to extract data fast enough using MySQL.
I was wondering if it will help if I create a new database or a new tables set for each user instead of using a single large database.
The only worry I have is that the users will be many but I hope that they will not use the project at the same time.
Does anyone have any experience with similar structures or any other advice to solve the problem?
use single database & single "User" table for all users...allocate theme unique id....
or if a single user have many datas.....
Example
if user_a have 10 books....
make a different table like "Book_table" with this structure.....
id , user_id , book_name
you can use this structure for multiple users......
Please explain more what user's details you have......i will try to give more accurate answer.....
Definitely use only one User table. If you need to search that table quickly you can index columns that you need fast searching on. For examplel you have a user table with first_name, last_name and email address. These are text fields so searching would be slow since it needs to scan the entire table and compare all the strings (note string comparison is much slower than integers). To get around this you can index columns like email address which is sort of like an ordered table (hidden away) which only has emails in it. Ordered tables are much easier to search so your searches would be fast.
Here is a good basic example of how to use indexing. Be careful with indexes though since they add overhead on your inserts. Some reading up on your part will really help here.
Anyway that is a really over simplified explanation of indexing.
I have a very small MySQL database that stores information about goods and users. I am trying to implement search among users, who bought some goods by firstname and lastname. Sphinx search engine has lot of good recommendations. So I am using it. Now my search looks like following:
Search with Sphinx IDs of users according to firstname and lastname.
Search in MySQL (not with Sphinx) goods according to specific
filters (id or category, price, etc.) where user_id IN IDs from
item1.
How to implement this with one JOIN query?
You can't directly, because as you say the sphinx index, and the database live within different 'systems'.
So the 'join' is happening in your application. Sounds like you are already implemening what is effectivly a join.
But there are two alternatives if you really dont want to continue with that system,
1) SphinxSE. Its a fake mysql storage engine, when you make a query against the virtual table, a query is made in the background back to sphinx index, and the results of the query are presented as a table, to mysql. Now because its a mysql table, mysql then join it with the database table(s) to present resultset, combinging the query and the data. (there is still seperate systems, but mysql implements the joining logic)
2) Attributes. Can store data in the sphinx index, alongside the full-text index. Sphinx can return the attributes in result sets. In this way you avoid the need for the join, because you get the search results along with the data (which you would of got from mysql) in one go.
(in this way you create one big 'normalized' index)
I am current in planning on creating a big database (2+ million rows) with a variety of data from separate sources. I would like to avoid structuring the database around auto_increment ids to help prevent against sync issues with replication, and also because each item inserted will have a alphanumeric product code that is guaranteed to be unique - it seems to me more sense to use that instead.
I am looking at a search engine to index this database with Sphinx looking rather appealing due to its design around indexing relational databases. However, looking at various tutorials and documentation seems to show database designs being dependent on an auto_increment field in one form or another and a rather bold statement in the documentation saying that document ids must be 32/64bit integers only or things break.
Is there a way to have a database indexed by Sphinx without auto_increment fields as the id?
Sure - that's easy to work around. If you need to make up your own IDs just for Sphinx and you don't want them to collide, you can do something like this in your sphinx.conf (example code for MySQL)
source products {
# Use a variable to store a throwaway ID value
sql_query_pre = SELECT #id := 0
# Keep incrementing the throwaway ID.
# "code" is present twice because Sphinx does not full-text index attributes
sql_query = SELECT #id := #id + 1, code AS code_attr, code, description FROM products
# Return the code so that your app will know which records were matched
# this will only work in Sphinx 0.9.10 and higher!
sql_attr_string = code_attr
}
The only problem is that you still need a way to know what records were matched by your search. Sphinx will return the id (which is now meaningless) plus any columns that you mark as "attributes".
Sphinx 0.9.10 and above will be able to return your product code to you as part of the search results because it has string attributes support.
0.9.10 is not an official release yet but it is looking great. It looks like Zawodny is running it over at Craig's List so I wouldn't be too nervous about relying on this feature.
sphinx only requires ids to be integer and unique, it doesn't care if they are auto incremented or not, so you can roll out your own logic. For example, generate integer hashes for your string keys.
Sphinx doesnt depend on auto increment , just needs unique integer document ids. Maybe you can have a surrogate unique integer id in the tables to work with sphinx. As it is known that integer searches are way faster than alphanumeric searches. BTW how long is ur alphanumeric product code? any samples?
I think it's possible to generate a XML Stream from your data.
Then create the ID via Software (Ruby, Java, PHP).
Take a look at
http://github.com/burke/mongosphinx