Search a table with comma delimited string? - mysql

I have a column in my table called tags. It has comma delimited text in it (box, hat, car).
I select a row from my table and I want to find other rows that have similar tags to the tags in the row selected.
I know it could be a better table design, but I can't change the design.
I know this will search the tags for a keyword, but I don't want to search via a keyword but by a list of tags.
WHERE (',' + Tags + ',') LIKE '%,keyword,%'
Does anyone know how I would do this?
Using MYSQL.

You say you can't change the design. Can you add a FULLTEXT index and put this table in the MyISAM access method? If so you can use FULLTEXT searching. For your application you'll do best using BOOLEAN mode.
WHERE MATCH (tags)
AGAINST ('box hat -car' IN BOOLEAN MODE);
This particular search looks for keywords box and hat, and excludes the keyword car.
Here's a description. This might work well for you.
http://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html
You can run a FULLTEXT search in InnoDB. But I don't believe you can index it.
If you can't use a fulltext search, you're stuck with the wrong tool for your job. Matching comma-delimited strings with SQL is like driving nails by hitting them with the handle of a screwdriver. Both take a long time and are incredibly awkward.

Related

SQL Thesaurus like behaviour without Full-text Search

My search engine is written using mySQL and ColdFusion, the main part of is where I loop the following (or similar) on each search term separated by spaces.
AND productname REGEXP '(\\b#trim(search)#\\b)'
I'd like to introduce alternatives words, so that when any one of these are searched the rest are included
M43, M4/3, Micro-Four-Thirds, MFT
I could use a separate query to get a list of matching keywords and loop in an OR - but this seemed a little unrefined.
My database is a list of camera product names with little consistency, I tested full-text search but was not happy with the results.
So is there an 'Theasaurus' like option without the need of Full-text search implementation?
I added a product keyword field, which I populated with alternative keywords.
My SQL where clause has now loops the terms and compares against the concatenated product name and keywords
AND concat_ws(' ', productname, productkeywords) REGEXP '(\\b#trim(search)#\\b)'
Thanks to #SevRoberts for this solution

How to speed up search MySQL? Is fulltext search with special characters possible?

I have strings like the following in my VARCHAR InnoDB table column:
"This is a {{aaaa->bbb->cccc}} and that is a {{dddd}}!"
Now, I'd like to search for e.g. {{xxx->yyy->zzz}}. Brackets are part of the string. Sometimes searched together with another colum, but which only contains an ordinary id and hence don't need to be considered (I guess).
I know I can use LIKE or REGEXP. But these (already tried) ways are too slow. Can I introduce a fulltext index? Or should I add another helping table? Should I replace the special characters {, }, -, > to get words for the fulltext search? Or what else could I do?
The search works with some ten-thousand rows and I assume that I often get about one hundred hits.
This link should give you all the info you need regarding FULLTEXT indexes in MySQL.
MySQL dev site
The section that you will want to pay particular attention to is:
"Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row."
So in short, to answer your question you should see an improvement in query execution times by implementing a full text index on wide VARCHAR columns. Providing you are using a compatible storage engine ( InnoDB or MyISAM)
Also here is an example of how you can query the full text index and also an additional ID field as hinted in your question:
SELECT *
FROM table
WHERE MATCH (fieldlist) AGAINST ('search text here')
AND ( field2= '1234');

How do I make data retrieval faster? [duplicate]

I am building a search feature for the messages part of my site, and have a messages database with a little over 9,000,000 rows, and and index on the sender, subject, and message fields. I was hoping to use the LIKE mysql clause in my query, such as (ex)
SELECT sender, subject, message FROM Messages WHERE message LIKE '%EXAMPLE_QUERY%';
to retrieve results. unfortunately, MySQL doesn't use indexes when a leading wildcard is present , and this is necessary for the search query could appear anywhere in the message (this is how the wildcards work, no?). Queries are very very slow and I cannot use a full text index either, because of the annoying 50% rule (I just can't afford to rule that much out). Is there anyway (or even, any alternative to this) to optimize a query using like and two wildcards? Any help is appreciated.
You should either use full-text indexes (you said you can't), design a full-text search by yourself or offload the search from MySQL and use Sphinx/Lucene. For Lucene you can use Zend_Search_Lucene implementation from Zend Framework or use Solr.
Normal indexes in MySQL are B+Trees, and they can't be used if the starting of the string is not known (and this is the case when you have wildcard in the beginning)
Another option is to implement search on your own, using reference table. Split text in words and create table that contains word, record_id. Then in the search you split the query in words and search for each of the words in the reference table. In this way you are not limitting yourself to the beginning of the whole text, but only to the beginning of the given word (and you'll match the rest of the words anyway)
'%EXAMPLE_QUERY%'; is a very very bad idea .. am going to give you some
A. Avoid wildcards at the start of LIKE queries use 'EXAMPLE_QUERY%'; instead
B. Create Keywords where you can easily use MATCH
If you want to stick with using MySQL, you should use FULL TEXT indexes. Full text indexes index words in a text block. You can then search on word stems and return the results in order of relevance. So you can find the word "example" within a block of text, but you still can't search efficiently on "xampl" to find "example".
MySQL's full text search is not great, but it is functional.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
select * from emp where ename like '%e';
gives emp_name that ends with letter e.
select * from emp where ename like 'A%';
gives emp_name that begins with letter a.
select * from emp where ename like '_a%';
gives emp_name in which second letter is a.

Making custom full text search on mysql (making index file)

I'd like to make my own custom full text search and I am not sure what is the best way to make index table.
Ok, I take text field and extract all the words that are longer than 3 to the index table. But what do I have to store about them? word, ID of the table where I am searching? Anything else? Frequency of the word?
And support question: How do I split the text field to words, is there any mysql function or should I do this using server side language?
UPDATE: To make things clear: I don't need full text search just a wordlist of the words that are in all records of my text field, so I could search for the endings with LIKE 'word%'
If you are only going to implement what MySQL calls boolean mode (no relevance counting), you should implement the following basic functionality:
A wordbreaker, an algorithm that splits the strings into words. This is trivial in English but can be a problem for some Asian languages which do not use spaces between words.
Optionally, a stemmer, an algorithm which reduces words to their basic forms, so that went and gone both become go.
Optionally, a spellchecker, an algorithm which corrects the common spelling errors.
Optionally, a thesaurus, which reduces the synonyms to their common form.
A result of all this is that you have a string like this:
a fast oburn vixen jmups over an indolent canine
split into the basic forms of the words with the synonyms replaced and errors corrected:
quick
brown
fox
jump
over
lazy
dog
Then you just create a composite index on (word, rowid), where word is the basic form and rowid is the PRIMARY KEY of the record indexed.
To query for, say, '+quick +fox', you should search your index for these words and find an intersection on rowid. The intersecting rowid will contain both words.
If you are going to take relevance into account, you should additionally maintain a per-word statistics in a separate index over the whole corpus.
I should warn you that this is not a simple task. Just take a look at Sphinx source code.
Don't do it
Unless you know what you are doing forget about rolling your own full-text-search.
Let MySQL do the heavy lifting.
Use MyISAM for the table your want to search on
Put a FULLTEXT index on the text-fields you want to index.
Then do
SELECT *, MATCH(field1, field2) AGAINST 'text to search'
IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION AS relevancy
FROM table1
WHERE MATCH(field1, field2) AGAINST 'text to search'
IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
ORDER BY relevancy
See: http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html#function_match
OP indicated that he want to search for word endings.
Where I live (Holland) we do poems in December, so I do a lot of word ending searching to find words that ryhme.
Here's my trick.
Add a new field to your table named visa_versa: varchar indexed
UPDATE mytable SET mytable.visa_versa = REVERSE(mytable.myword);
Now you can do an indexed search on word endings with
SET #ending = 'end';
SELECT myword FROM mytable where visa_versa LIKE REVERSE(CONCAT('%',#ending));

Proper mysql datastructure for a fulltext search

Hoping someone can provide some mysql advice...
I have 2 tables that look like this:
searchTagsTable
ID
tag
dataTable
ID
title
desc
tagID
So the column "tagID" in "dataTable" is a comma-delimmited string of ids pointing to searchTagsTable.
I'd like to use mysql's built in fulltext search capabilities to search title, description, and tags.
I'm wondering: What is considered the "best" solution in a situation like this?
Should I leave the datastructure as it is? If so, how should I structure the sql to allow fulltext search of all three columns - title, desc and tag?
Or would it be preferable just to get rid of keywordsTable and have the actual tags comma delimmited in a "tags" column in dataTable?
Thanks in advance for your help.
Travis
Should I leave the datastructure as it is? If so, how should I structure the sql to allow fulltext search of all three columns - title, desc and tag?
That wouldn't be possible. Indexes can only span columns of a single table.
Or would it be preferable just to get rid of keywordsTable and have the actual tags comma delimmited in a "tags" column in dataTable?
That would certainly be the simplest solution. You are currently not really getting any benefit from giving tags their own identity, since you can't use foreign keys and indexing on them.
However, MySQL's FULLTEXT indexing is not ideal for a tag system:
by default, it won't index words shorter than four letters;
by default, it has many (many) stopwords it won't index that you might want to use for tags;
it'll be less efficient than a normal index;
it only works in MyISAM, which is in all other respects a much worse database engine than InnoDB. Except where you really have to, you shouldn't really be using MyISAM today.
You can fix the minimum word length and stopwords by altering the MySQL configuration. This will make your indexes much bigger though. This may be an acceptable solution if you control the database everywhere your app will be deployed, and if you are only using tags as ‘extra words’ in a fulltext search-fodder, rather than a full categorisation system.
Otherwise... comma-delimited anything in a database is suspect IMO. It's usually better to use a one-to-many join table to express the idea that one entity has many tags. Then you can use a simple index to aid lookups instead of the limited FULLTEXT indexing scheme, which will be faster, more reliable, and allows you to use InnoDB and foreign keys. eg.:
dataTable
ID (primary key)
title
desc
dataTags
ID (foreign key -> dataTable)
tagName (index this column)
(You could still have the tagID->tagName mapping as well on top of this if you want the tags to have independent identity. I'm not sure if it's doing anything useful in your case though.)
If you need to get a comma-separated list from a one-to-many relation like the above, you can do it using the MySQL-specific GROUP_CONCAT function.
SELECT dataTable.*, GROUP_CONCAT(dataTags.tagName)
FROM dataTable
JOIN dataTags ON dataTags.ID=dataTable.ID
GROUP BY dataTable.ID;
That leaves the fulltext indexing of the title and desc. Which unfortunately does need you to put them in a MyISAM table.
A common alternative to this which you might also consider would be to keep the ‘canonical’ copies in the main table (potentially in an ACID-safe InnoDB table), and store a separate copy of all the title, desc and tags together in a FULLTEXT-indexed MyISAM table purely for fulltext search bait. This does mean you have to do an extra update each time you change the primary data (though if you fail or have to rollback a transaction, at least it's only relatively-unimportant search bait that's now wrong), but the advantage is you can apply extra processing to it, such as stemming and punctuation handling, which MySQL's FULLTEXT indexer doesn't do itself.