SQL Thesaurus like behaviour without Full-text Search - mysql

My search engine is written using mySQL and ColdFusion, the main part of is where I loop the following (or similar) on each search term separated by spaces.
AND productname REGEXP '(\\b#trim(search)#\\b)'
I'd like to introduce alternatives words, so that when any one of these are searched the rest are included
M43, M4/3, Micro-Four-Thirds, MFT
I could use a separate query to get a list of matching keywords and loop in an OR - but this seemed a little unrefined.
My database is a list of camera product names with little consistency, I tested full-text search but was not happy with the results.
So is there an 'Theasaurus' like option without the need of Full-text search implementation?

I added a product keyword field, which I populated with alternative keywords.
My SQL where clause has now loops the terms and compares against the concatenated product name and keywords
AND concat_ws(' ', productname, productkeywords) REGEXP '(\\b#trim(search)#\\b)'
Thanks to #SevRoberts for this solution

Related

Extracting a value from an Array using mysql

I have a column that has brand names in an array format as below:
I want to extract information associated with Brand4 for example 'price'.
I tried using the below, but that's a psql query. How can I extract this information using MySQL in GCP.
SELECT Brand_name, price
FROM table_name
Where 'Brand4'=Any(Brand_name)
First, the explanation for your error message is that in MySQL, ANY() accepts a subquery, not just a single column or expression. See https://dev.mysql.com/doc/refman/8.0/en/any-in-some-subqueries.html
MySQL does not have an array type. Your Brand_name column is not an array, it's a string. It happens to contain commas and square brackets, but these are just characters in a string.
So your solutions are to use various string-search functions or expressions, as other folks have suggested.
The downside to all the string-search functions is that they cannot be optimized with a conventional index. So every search will be expensive, because it requires a table-scan.
Another solution I did not see yet is to use a fulltext index.
alter table brands add fulltext index (brand_name);
select * from brands
where match(brand_name) against ('Brand4' in boolean mode);
This may require some special handling if the brand names contain spaces or punctuation, but if they are plain words, it should work.
Read https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html to understand more about fulltext indexes.
The best solution would be to eliminate this fake "array" column by normalizing the schema to store one brand per row in another table. Then you can match strings exactly and optimize with a conventional index. But I understand you said that the table structure is not up to you.
This should work in MySQL (using a string function as mention here):
SELECT *
FROM brands
WHERE FIND_IN_SET('Brand4',brand_name);
see: DBFIDDLE
Provided SQL query will work in MySQL, if you will make a subquery within the parentheses, or use FIND_IN_SET instead of using ANY.
But, as stated in the MySQL documentation:
This function does not work properly if the first argument contains a
comma (,) character.
So, as an alternative, you could use LIKE (simple pattern matching).
Your SQL code then would be:
SELECT `brand_name`, `price`
FROM `test`
WHERE `brand_name` LIKE "%Brand4%"
See SQLFiddle for live example.
Also, you could use LOCATE.
Or any other alternative solution.
But, I must say that storing list data in the way you do, - it's not the best practice out there.
There are plenty of ways this can be done better.
For example, using M:M (many-to-many) relationship.
In case you made this design you really have to reconsider/redesign. Databases have there own data structures and sql is not an imparative language but a declaritve one.
If when you didnĀ“t desing you should consider create a table out of the one column. Perhaps this is what you try.
If it is just locating a specific string in the values of a field use like
SELECT Brand_name, price
FROM table_name
Where brand_anme like '%Brand4%'
But realize this is will not always yield accurate results.

mySQL: nested match against query [duplicate]

I need to do a Fulltext search for a whole bunch of values out of a column in another table. Since MATCH() requires a value in the AGAINST() part, a straightforward: "SELECT a.id FROM a,b WHERE MATCH(b.content) AGAINST(a.name)" fails with "Incorrect arguments to AGAINST".
Now, I know I could write a script to query for a list of names and then search for them, but I'd much rather work out a more complex query that can handle it all at once. It doesn't need to be speedy, either.
Ideas?
thanks
Unfortunately, http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html says:
The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row.
Looks like you'll have to search for the patterns one at a time if you use MySQL's FULLTEXT index as your search solution.
The only alternative I can think of to allow searching for many patterns like you describe is an Inverted Index. Though this isn't as flexible or scalable as a true full-text search technology.
See my presentation http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
I hope my solution will be useful to you:
PREPARE stat FROM 'SELECT user_profession FROM users INNER JOIN professions ON MATCH(user_profession) AGAINST (?)';
SET #c_val = (SELECT prfs_profession FROM professions WHERE prfs_ID=1);
EXECUTE stat USING #c_val;

How do I make data retrieval faster? [duplicate]

I am building a search feature for the messages part of my site, and have a messages database with a little over 9,000,000 rows, and and index on the sender, subject, and message fields. I was hoping to use the LIKE mysql clause in my query, such as (ex)
SELECT sender, subject, message FROM Messages WHERE message LIKE '%EXAMPLE_QUERY%';
to retrieve results. unfortunately, MySQL doesn't use indexes when a leading wildcard is present , and this is necessary for the search query could appear anywhere in the message (this is how the wildcards work, no?). Queries are very very slow and I cannot use a full text index either, because of the annoying 50% rule (I just can't afford to rule that much out). Is there anyway (or even, any alternative to this) to optimize a query using like and two wildcards? Any help is appreciated.
You should either use full-text indexes (you said you can't), design a full-text search by yourself or offload the search from MySQL and use Sphinx/Lucene. For Lucene you can use Zend_Search_Lucene implementation from Zend Framework or use Solr.
Normal indexes in MySQL are B+Trees, and they can't be used if the starting of the string is not known (and this is the case when you have wildcard in the beginning)
Another option is to implement search on your own, using reference table. Split text in words and create table that contains word, record_id. Then in the search you split the query in words and search for each of the words in the reference table. In this way you are not limitting yourself to the beginning of the whole text, but only to the beginning of the given word (and you'll match the rest of the words anyway)
'%EXAMPLE_QUERY%'; is a very very bad idea .. am going to give you some
A. Avoid wildcards at the start of LIKE queries use 'EXAMPLE_QUERY%'; instead
B. Create Keywords where you can easily use MATCH
If you want to stick with using MySQL, you should use FULL TEXT indexes. Full text indexes index words in a text block. You can then search on word stems and return the results in order of relevance. So you can find the word "example" within a block of text, but you still can't search efficiently on "xampl" to find "example".
MySQL's full text search is not great, but it is functional.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
select * from emp where ename like '%e';
gives emp_name that ends with letter e.
select * from emp where ename like 'A%';
gives emp_name that begins with letter a.
select * from emp where ename like '_a%';
gives emp_name in which second letter is a.

Search a table with comma delimited string?

I have a column in my table called tags. It has comma delimited text in it (box, hat, car).
I select a row from my table and I want to find other rows that have similar tags to the tags in the row selected.
I know it could be a better table design, but I can't change the design.
I know this will search the tags for a keyword, but I don't want to search via a keyword but by a list of tags.
WHERE (',' + Tags + ',') LIKE '%,keyword,%'
Does anyone know how I would do this?
Using MYSQL.
You say you can't change the design. Can you add a FULLTEXT index and put this table in the MyISAM access method? If so you can use FULLTEXT searching. For your application you'll do best using BOOLEAN mode.
WHERE MATCH (tags)
AGAINST ('box hat -car' IN BOOLEAN MODE);
This particular search looks for keywords box and hat, and excludes the keyword car.
Here's a description. This might work well for you.
http://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html
You can run a FULLTEXT search in InnoDB. But I don't believe you can index it.
If you can't use a fulltext search, you're stuck with the wrong tool for your job. Matching comma-delimited strings with SQL is like driving nails by hitting them with the handle of a screwdriver. Both take a long time and are incredibly awkward.

MySQL Fulltext search but using LIKE

I'm recently doing some string searches from a table with about 50k strings in it, fairly large I'd say but not that big. I was doing some nested queries for a 'search within results' kinda thing. I was using LIKE statement to get a match of a searched keyword.
I came across MySQL's Full-Text search which I tried so I added a fulltext index to my str column. I'm aware that Full-text searches doesn't work on virtually created tables or even with Views so queries with sub-selects will not fit. I mentioned I was doing a nested queries, example is:
SELECT s2.id, s2.str
FROM
(
SELECT s1.id, s1.str
FROM
(
SELECT id, str
FROM strings
WHERE str LIKE '%term%'
) AS s1
WHERE s1.str LIKE '%another_term%'
) AS s2
WHERE s2.str LIKE '%a_much_deeper_term%';
This is actually not applied to any code yet, I was just doing some tests. Also, searching strings like this can be easily achieved by using Sphinx (performance wise) but let's consider Sphinx not being available and I want to know how this will work well in pure SQL query. Running this query on a table without Full-text added takes about 2.97 secs. (depends on the search term). However, running this query on a table with Full-text added to the str column finished in like 104ms which is fast (i think?).
My question is simple, is it valid to use LIKE or is it a good practice to use it at all in a table with Full-text added when normally we would use MATCH and AGAINST statements?
Thanks!
In this case you not neccessarily need subselects. You can siply use:
SELECT id, str
FROM item_strings
WHERE str LIKE '%term%'
AND str LIKE '%another_term%'
AND str LIKE '%a_much_deeper_term%'
... but also raises a good question: the order in which you are excluding the rows. I guess MySQL is smart enough to assume that the longest term will be the most restrictive, so starting with a_much_deeper_term it will eliminate most of the records then perform addtitional comparsion only on a few rows. - Contrary to this, if you start with term you will probably end up with many possible records then you have to compare them against the st of the terms.
The interesting part is that you can force the order in which the comparsion is made by using your original subselect example. This gives the opportunity to make a decision which term is the most restrictive based upon more han just the length, but for example:
the ratio of consonants a vowels
the longest chain of consonants of the word
the most used vowel in the word
...etc. You can also apply some heuristics based on the type of textual infomation you are handling.
Edit:
This is just a hunch but it could be possible to apply the LIKE to the words in the fulltext indexitself. Then match the rows against the index as if you have serched for full words.
I'm not sure if this is actually done, but it would be a smart thing to pull off by the MySQL people. Also note that this theory can only be used if all possible ocurrences arein fact in the fulltext search. For this you need that:
Your search pattern must be at least the size of the miimal word-length. (If you re searching for example %id% then it can be a part of a 3 letter word too, which is excluded by default form FULLTEXT index).
Your search pattern must not be a substring of any listed excluded word for example: and, of etc.
Your pattern must not contain any special characters.