Thinking sphinx/mysql for non text search - mysql

Where I have a search which has a category (foreign key) and optional text, should I use thinking sphinx to "search" where a search string has not been submitted, solely the category?

It really depends on your use case. Let's say for example you have blog posts, and they have categories a, b, and c.
If you want yoursite.com/a/ to list all posts in category a in order from newest to oldest, then it's probably not the greatest idea to use sphinx/search for that. It will be a simple database query, possibly with pagination.
However, let's say you want that page to list all posts with that category, or that might relate to that category according to the text, and also maybe posts that have tags related to that category. In this case, it is probably best to use a search engine, like sphinx, to power this page. The search engine will be much faster if the equivalent database query is very expensive.

Related

What is the best practice to write a search query for SOLR for an ecommerce website

It is my first project using SOLR, I have indexed the all products in solr and created a copyField named searchable. Copied field like product_title, description, categories titles, filters in this field.
I am using query given below to get results.
http://localhost:8080/solr/testcore/select?indent=on&q=status:1 AND is_single:0 AND searchable:sleeve+medium AND seller_status:1&wt=json
I am getting the matching results but couple of questions I have:
Is there any mechanism to sort result by exact match on top
I indexed the quantity and stock status of product, Can I give low weightage to products which have quantity = 0 or stock_status = 'Out of Stock" so out of stock items always display in bottom of search.
Thank you.
You can find the answer to your first question here.
About the second question, using eDismax you can use the boost query (bq) value:
bq=stock_status:"Out of Stock"^-0.1
in general though is better to boost important documents than de-boost docs with low importance.

Finding rows in MySQL Table that do NOT have certain text

Suppose I want to find an article in a database table that includes the text "There were many bison" With phpMyAdmin, I can navigate to Search, choose a field, then choose Like %...%, and it will select the article that includes those words.
I'd like to know if there's a way to find all rows that do NOT include that string.
Let me explain my bigger goal. I'm working on articles about many animal species that are divided into sections on Classification, Distribution, Ecology, etc. Each section can be thought of as an independent article, and I was tempted to make unique tables for each of these sections. However, that would be a logistical nightmare; I'd need literally hundreds of tables.
So I just write one long article with each section beginning with something like this:
So if I have articles about 600 species in my database table, and I want to know which articles DO NOT include an Ecology section, I can simply search for all the rows that do not have that particular div, or something similar (e.g [h2]Ecology[/h2] - though with real tags, not brackets).
Is there a way to do that with phpMyAdmin, MySQL Workbench (which I downloaded and installed just today) or some other tool?
Thanks.
you could use a NOT REGEX http://dev.mysql.com/doc/refman/5.1/en/regexp.html with SQL.
one solution would be to create a categories table on your database and then assign each article a category. That way you could create a query to select all the articles that have the specific category that you want.
example would be :
table articles:
-article_id (primary Key)
-article
-category_cat_id (foreign Key that references cat_id)
table category
- cat_id (primary Key)
-cat_name
a query to select all the articles with the categry of lets say ecology:
SELECT * FROM articles
LEFT JOIN category
ON articles.category_cat_id = category.cat_id
WHERE cat_name != 'ecology'(if you want to select all the articles except those with a ecology cateogry)
another alternative is
WHERE cat_name = 'ecology'( if you want to select all posts with the category of ecology)

Migrating from MySQL to MongoDB - best practices

So, it may be best to just try it out and see through some trial and error, but I'm trying to figure out the best way to migrate a pretty simple structure from mysql to mongodb. Let's say that I have a main table in mysql called 'articles' and I have two other tables, one called 'categories' and the other 'category_linkage'. The categories all have an ID and a name. The articles all have an ID and other data. The linkage table relates articles to categories, so that you can have unlimited categories related to each article.
From a MongoDB approach, would it make sense to just store the article data and category ID's that belong to that article in the same collection, thus having just 2 data collections (one for the articles and one for the categories)? My thinking is that to add/remove categories from an article, you would just update($pull/$push) on that particular article document, no?
In my opinion, a good model would look like this:
{"article_name": "name",
"category": ["category1_name", "category2_name", ...],
"other_data": "other data value"
}
So, to embed the category names directly to the article document. Updating article categories is easy, but removing a category altogether requires modifying all articles belonging to the category. If removing categories is frequent, then keeping them separate might be a good idea performance-wise.
This approach makes it also easy to make queries on the category name (no need to map name to id with a separate query).
Thus, the "correct" way to model the data depends on the assumed use case, as is typically the case with mongodb and other nosql databases.
If you have access to a Mac computer, you could give the MongoHub GUI a try. It has an "Import from MySQL" feature.

database design for tagging multiple sources (MySQL)

I'm working on a project where I have the following (edited) table structures: (MySQL)
Blog
id
title
description
Episode
id
title
description
Tag
id
text
The idea is that that tags can be applied to any Blog or Episode (and to other types of sources), new tags can be created by the user if it doesn't exist already in the tag table.
The purpose of the tags is that a user will be able to search the site, and the results will search across all types of material on the site. Also, at the bottom of each blog article/episode description it would have a list of tags for that item.
I'd thought too much about the search mechanism, but I guess it'd be flexible between an OR and AND searches, if that has any impact on choices, and probably allow the user to filter the results for particular types of sources.
Originally I was planning to create multiple tag mapping tables:
BlogTag
id
tag_id
blog_id
EpisodeTag
id
episode_id
tag_id
But now I wonder if I would be better off with:
TaggedStuff
id
source_type
source_id
tag_id
Where source_type would be an integer related to whether it was an Episode, Blog, or some other type that I've not included in the structures above, and source_id would be the reference in that particular table.
I'm just wondering what the optimum structure would be for this, the first choice or the second?
In a clean (academic) design you would often see to have a supertype Resource (or something similar) for Blog and Episode with it's own table. Another table for the tags. And since it's a N:M relationship between Tag and Resource you have an extra mapping table between them.
So in such a design you would associate the Tag-Entities with your resources by having a relationship to their generalization.
After that you can put general attributes to the generalization. (i.e. title, description)
You can add attributes to the relationship between Tag and Resource like a counter how often a specific resource was tagged with a specific tag. Or how often a tag was used and and and (i.e. something like you see on stackoverflow in the upper right here)
The biggest loss in going with structure 2 is loss of referential integrity. If you can say "whatever" to that, it might be easier to go with this structure.
When I say structure 2 I mean:
TaggedStuff
id
source_type
source_id
tag_id
If I understand you correctly, the point is to optimize search mechanism...
So it has sense to make some kind of index_table and demoralize the data there...
I mean smth like this:
Url, Type, Title, Search_Field etc..
where Url is the path to the article or episode, Type (article|episode), Name (what users will see), Search_Field ( list of tags, other important data for search )
thats why both variants are quite good)))

How to store these field descriptions in mysql?

Apologize for the long topic, I didn't intend for it to be this long, but it's a pretty simple issue I've been having. :)
Let's say you have a simple table called tags that has columns tag_id and tag. The tag_id is simply an auto increment column and the tag is the title of the tag. If I need to add a description field, that would be around 1-2 paragraphs on average (max around 3-4 paragraphs probably), should I simply add a description field to the table or should I create a new table called tag_descriptions and store the descriptions with the tag_id?
I remember reading that it is better to do this because if you do a query that doesn't select the description, that description field will still slow down mysql. Is this true? I don't even remember where I read that from, but I've been kind of following it for a couple years now... Finally I question if I need to do this, I have a feeling I don't. You'd also need to inner join whenever you need the description field.
Another question I have is, is it generally bad to create new tables that will only hold very few rows at the max? What if this data doesn't fit anywhere else?
I have a simple case below which relates to these two questions.
I have three tables content, tags, and content_tags that make up a many to many relationship:
content
content_id
region (enum column with
about 6-7 different values and most
likely won't grow later on)
tags
tag_id
tag
content_tags
content_id
tag_id
I want to store a description around 1-2 paragraphs for each tag, but also for each region. I'm wondering what would be the best way to do this?
Option A:
Just add a description column to the
tags table
Create a new table for
region_descriptions
Option B:
Create a new table called
descriptions with fields: id,
description, and type
The id would be id of the content or
id of the enum field
The type would be whether it is a tag
description, or region description
(Would use the enum column for this)
Maybe have a primary key on the id and type?
Option C:
Create a new table for tag_descriptions
Create a new table for region_descriptions
Option A seems to be a good choice if adding the description column doesn't slow down mysql select queries that don't need the description.
Assuming the description column would slow down mysql, option B might be a good choice. It also removes the need for a small table with just 6-7 rows that would hold the region descriptions. Although now that I think of it, would it be slow to connect to this table if originally to get a region description you'd only need to go through very little rows.
Option C would be ideal if the description columns would slow down mysql and if a small table like region descriptions would not matter.
Maybe none of these options are the best, feel free to offer another option. Thanks.
P.S. What would be an ideal column type to use to hold data that usually 1-2 paragraphs, but might be a little more sometimes?
I don't think it really matters if you don't handle thousands of queries per minute. If you are going to have a zillion queries per minute, then I would implement the various options and perform benchmarks for all these options. Based on the results, you can make a decision.
In my (admittedly somewhat uninformed) opinion, it really depends on how much you'll be using both of them.
If properly indexed, that JOIN should not be very expensive. Also, a larger table will be slower. It inhibits caching, and takes longer to access stuff, although indexing seriously mitigates this problem.
If you'll be joining tag names to tag IDs a LOT, and only rarely will be using the descriptions, I'd say go with separate tables. If you'll be using the descriptions more often, go with one table.
For the first part of your question: if you have a tag with an id, a name and a description, you should save it in 1 table.
Now, this query
SELECT name FROM tags WHERE id = 1;
will NOT slow down if you have 1, 2 or 20 extra fields in there.