MySql Ordered Keyword Search - mysql

I have two tables currently:
search_matches:
match_id (int) <-- primary key
parent_id (int) <-- foreign-key
word_id (int) <-- foreign-key (to a table filled with words that are unique and have an id)
pos (int) <-- the position of the word in the block of text it comes from
search_words: (update)
word_id (int) <-- primary key
word (varchar ...) <-- the word
(I'm using innodb, and my host won't upgrade mysql, so fulltext is out)
I'd like to be able for my users to search using ". So that they can search for "foo bar".
I've thought of a few ways of doing this, but the least intensive seems to be adding another column:
next_pos (int)
I could then do
(SELECT * FROM table WHERE word_id='foo') as foo
INNER JOIN (SELECT * FROM table WHERE word_id='bar') AS bar
ON (
foo.parent_id=bar.parent_id AND
foo.next_pos=bar.next_pos
)
It comes at the cost of storing an extra column and an inner join for each word beyond the first, but its the best option I've come up with so far. (The idea previous to this was one less column, but needing to do an addition operation within the ON block, something I thought might be too expensive as my site grows.
Is this my best option, or is there another out there? I'm still just playing in staging, so now's the time to make changes.
Update 1:
I'm now considering using the keyword table to narrow down my search and then using like on that instead of multiple joins as this may be faster yet and greatly eliminates the need for joins. It just would not be productive to do a like on my entire database.

I really can't understand why do you want to do all this manual work. There are tools out there that can simply it. From what I read what you want to do is related to a full text search. You don't need to build the index yourself.
Have you considered using something like SolR? It works well with any sort of DB as long as you create an index.

I don't see how you are going to make that search with your current set-up. If as you say you have a table that contains only UNIQUE words from a block of text, how would you expect to correlate this listing of unique words to the actual word placement in the full content? For example say the original content looked like this:
some text with foo and also with foo bar
Would you unique word table look like this?
word_id word
--------------
1 some
2 text
3 with
4 foo
5 and
6 also
7 bar
If so, how are you ever going to find foo and bar as adjacent records?
I assume your database also has the full content somewhere, so why not just search in the content using LIKE?

Related

Common words not showing up in FULLTEXT search results

I am using Full Text searching for a website I am making to order a users search query by relevance. This is working great with one problem. When a search term is populated in my table more than 50% of the time, that query is ignored. I am looking for a solution to NOT ignore words that are in more than 50% of the rows of a table.
For example, in my "items" table, it may look something like this:
item_name
---------
poster 1
poster 2
poster 3
poster 4
another item
If a user searches for "poster", the query returns 0 results because it appears too many times in the table. How can I stop this from happening?
I've tried using IN BOOLEAN MODE, but that takes away the functionality I need (which is ordering by relevance).
Here's an example of my SQL:
SELECT item_title
FROM items
WHERE MATCH(item_title, tags, item_category) AGAINST('poster')
You have to recompile MySQL to change this. From the documentation on Fine-Tuning MySQL Full-Text Search
The 50% threshold for natural language searches is determined by the particular weighting scheme chosen. To disable it, look for the following line in storage/myisam/ftdefs.h:
#define GWS_IN_USE GWS_PROB
Change that line to this:
#define GWS_IN_USE GWS_FREQ
Then recompile MySQL. There is no need to rebuild the indexes in this case.

How to search either on id or name for certain purchase orders

We would like to filter purchase orders either based on purchase order id (primary key) or name of the purchase order using a single search box.
We used the like parameter to search on the name field, but it doesn't seem to work on the primary key. It works only when we use the equal operator for id(s). But it would be preferable if we can filter purchase orders using like for id(s). How to do this?
create table purchase_orders (
id int(11) primary key,
name varchar(255),
...
)
Option 1
SELECT *
FROM purchase_orders
WHERE id LIKE '%123%'; -- tribute to TemporaryNickName
This is horrible, performance-wise :)
Option 2a
Add a text column which receives a string version of id. Maybe add some triggers to populate it automatically.
Option 2b
Change the type of id column to CHAR or VARCHAR (I believe CHAR should be preferred for a primary key).
In both 2a. and 2b. cases, add an index (maybe a FULLTEXT one) to this column.
I think LIKE should work. I assume that your SQL wasn't correctly written.
Let's assume that you have order name "ABCDEF" then you can find this using the following query structure.
SELECT id FROM purchase_orders WHERE name LIKE '%CD%';
To explain it, % sign means it's a wildcard. As a result this query is going to select any String that contains "CD" inside of it.
According to the table structure, varchar can contain 255 characters. I think this is quite a large string and it's probably going to consume a lot of resources and going to take more time to search something using SQL functions like LIKE. You can always search it by id
WHERE id = something. This is much faster way btw
, but I don't think order id is an user friendly data, instead I would let users to use product name. My recommendation is to use apache Lucene or MySQL's full text search feature (which can improve search performance).
Apache lucene
MySQL Full text search function
These are tools built to search certain pattern or word through list of large strings in much faster way. Many websites use this to build their own mini search engines. I found mysql full text search function requires pretty much no learning curve and straight forward to use =D

How to store these field descriptions in mysql?

Apologize for the long topic, I didn't intend for it to be this long, but it's a pretty simple issue I've been having. :)
Let's say you have a simple table called tags that has columns tag_id and tag. The tag_id is simply an auto increment column and the tag is the title of the tag. If I need to add a description field, that would be around 1-2 paragraphs on average (max around 3-4 paragraphs probably), should I simply add a description field to the table or should I create a new table called tag_descriptions and store the descriptions with the tag_id?
I remember reading that it is better to do this because if you do a query that doesn't select the description, that description field will still slow down mysql. Is this true? I don't even remember where I read that from, but I've been kind of following it for a couple years now... Finally I question if I need to do this, I have a feeling I don't. You'd also need to inner join whenever you need the description field.
Another question I have is, is it generally bad to create new tables that will only hold very few rows at the max? What if this data doesn't fit anywhere else?
I have a simple case below which relates to these two questions.
I have three tables content, tags, and content_tags that make up a many to many relationship:
content
content_id
region (enum column with
about 6-7 different values and most
likely won't grow later on)
tags
tag_id
tag
content_tags
content_id
tag_id
I want to store a description around 1-2 paragraphs for each tag, but also for each region. I'm wondering what would be the best way to do this?
Option A:
Just add a description column to the
tags table
Create a new table for
region_descriptions
Option B:
Create a new table called
descriptions with fields: id,
description, and type
The id would be id of the content or
id of the enum field
The type would be whether it is a tag
description, or region description
(Would use the enum column for this)
Maybe have a primary key on the id and type?
Option C:
Create a new table for tag_descriptions
Create a new table for region_descriptions
Option A seems to be a good choice if adding the description column doesn't slow down mysql select queries that don't need the description.
Assuming the description column would slow down mysql, option B might be a good choice. It also removes the need for a small table with just 6-7 rows that would hold the region descriptions. Although now that I think of it, would it be slow to connect to this table if originally to get a region description you'd only need to go through very little rows.
Option C would be ideal if the description columns would slow down mysql and if a small table like region descriptions would not matter.
Maybe none of these options are the best, feel free to offer another option. Thanks.
P.S. What would be an ideal column type to use to hold data that usually 1-2 paragraphs, but might be a little more sometimes?
I don't think it really matters if you don't handle thousands of queries per minute. If you are going to have a zillion queries per minute, then I would implement the various options and perform benchmarks for all these options. Based on the results, you can make a decision.
In my (admittedly somewhat uninformed) opinion, it really depends on how much you'll be using both of them.
If properly indexed, that JOIN should not be very expensive. Also, a larger table will be slower. It inhibits caching, and takes longer to access stuff, although indexing seriously mitigates this problem.
If you'll be joining tag names to tag IDs a LOT, and only rarely will be using the descriptions, I'd say go with separate tables. If you'll be using the descriptions more often, go with one table.
For the first part of your question: if you have a tag with an id, a name and a description, you should save it in 1 table.
Now, this query
SELECT name FROM tags WHERE id = 1;
will NOT slow down if you have 1, 2 or 20 extra fields in there.

How do I select a row in MySQL that contains multiple values?

I have a MySQL table that looks like this:
Table: Designer
id: integer
name: String
gallery: string
The gallery row can contain a string value like 23,36,45
Now I want to do a query similar to this:
SELECT * FROM Designer WHERE gallery = '36'
I know I kan use LIKE, but that is not precices enough. That could return both 36 and 136.
I also know I could create another table which links designer and gallery. But since this is not going to be a huge table, I'm adding the foreign gallery ID Key to the gallery row. And I'm lazy right now ;)
So how can I select a row that has the number 36?
UPDATE
Ok, since I'm getting nailed for poor design (yes I know it was), I See the obvious now.
A designer can have many galleries, but a gallery can only belong to one designer.
Therefore I only need to add designer ID as a foreign key to the gallery table.
Simple. But not always logical when it's 3AM and you've been workign for 15 hours ;)
If you have to do that you have poorly designed your tables.
One designer can have got many galleries and a gallery belong to one designer means you must create a foreign key 'designer' in your 'gallery' table, and your request will be
SELECT *
FROM Designer
INNER JOIN Gallery
ON Gallery.id = 36
AND Designer.id = Gallery.designer
I also agree that this is poorly designed table structure but here is the answer
SELECT * FROM Designer where FIND_IN_SET(36,gallery)
You really shouldn't store your data like that since it makes queries horribly inefficient. For that particular use case, you can (if you don't care about performance) use:
select * from designer
where gallery like '36,%'
or gallery like '%,36'
or gallery like '%,36,%'
or gallery = '36'
This will alleviate your concerns about partial matches since something like 136 will not match any of those conditions but 36 in any position will match.
However, despite your protestations to the contrary, what you should do is re-engineer your schema, something like:
designer:
id integer primary key
name varchar(whatever)
designer_galeries:
designer_id integer foreign key references designer(id)
gallery string
primary key (designer_id,gallery)
Then your queries will be blindingly fast. One of the first things you should loearn is to always design your schema for third normal form. You can revert to other forms for performance once you understand the trade-offs (and know how to mitigate problems) but it's rarely necessary unless you database is horribly convoluted.
You can use regular expressions in your WHERE clause instead.
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
SELECT * FROM Designer WHERE gallery REGEXP "(,|^)(36)(,|$)"

mysql keyword search across multiple columns

Various incarnations of this question have been asked here before, but I thought I'd give it another shot.
I had a terrible database layout. A single entity (widget) was split into two tables:
CREATE TABLE widgets (widget_id int(10) NOT NULL auto_increment)
CREATE TABLE widget_data (
widget_id int(10),
field ENUM('name','size','color','brand'),
data TEXT)
this was less that ideal. if wanted to find widgets of a specific name, color and brand, i had to do a three-way join on the widget_data table. so I converted to the reasonable table type:
CREATE TABLE widgets (widget_id int(10) NOT NULL auto_increment,
name VARCHAR(32),size INT(3),color VARCHAR(16), brand VARCHAR(32))
This makes most queries much better. But it makes searching harder. It used to be that if i wanted to search widgets for, say, '%black%', I would just SELECT * FROM widget_data WHERE data LIKE '%black%'. This would give me all instances of widgets that are black in color, or are made by blackwell industries, or whatever. I would even know exactly which field matched, and could show that to my user.
how do I execute a similar search using the new table layout? I could of course do WHERE name LIKE '%black%' OR size LIKE '%black%'... but that seems clunky, and I still don't know which fields matched. I could run a separate query for each column I want to match on, which would give me all matches and how they matched, but that would be a performance hit. any ideas?
You can include part of WHERE expression into selecting columns. For example:
SELECT
*,
(name LIKE '%black%') AS name_matched,
(size LIKE '%black%') AS size_matched
FROM widget_data
WHERE name LIKE '%black%' OR size LIKE '%black%'...
Then check value of name_matched on side of the script.
Not sure how it will affect performance. Feal free to test it before going to production
You have two conflicting requirements. You want to search as if all your data is in a single field, but you also want to identify which specific field was matched.
There's nothing wrong with your WHERE name LIKE '%black%' OR size LIKE '%black%'... expression. It's a perfectly valid search on the table as you have defined it. Why not just check the results in code to see which one matched? It's a minimal overhead.
If you want a cleaner syntax for your SQL then you could create a view on the table, adding an extra field which consists of concatenating the other fields:
CREATE VIEW extra_widget_data AS
SELECT (name, size, color, brand,
CONCAT(name, size, color, brand) as all_fields)
FROM widget_data;
Then you'd have to add an index on this field, which requires more space, CPU time to maintain etc. I don't think it's worth it.
You probably want to look into MySQL full text search capability, this enables you to match against multiple columns of varchar type.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html