mySQL: nested match against query [duplicate] - mysql

I need to do a Fulltext search for a whole bunch of values out of a column in another table. Since MATCH() requires a value in the AGAINST() part, a straightforward: "SELECT a.id FROM a,b WHERE MATCH(b.content) AGAINST(a.name)" fails with "Incorrect arguments to AGAINST".
Now, I know I could write a script to query for a list of names and then search for them, but I'd much rather work out a more complex query that can handle it all at once. It doesn't need to be speedy, either.
Ideas?
thanks

Unfortunately, http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html says:
The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row.
Looks like you'll have to search for the patterns one at a time if you use MySQL's FULLTEXT index as your search solution.
The only alternative I can think of to allow searching for many patterns like you describe is an Inverted Index. Though this isn't as flexible or scalable as a true full-text search technology.
See my presentation http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql

I hope my solution will be useful to you:
PREPARE stat FROM 'SELECT user_profession FROM users INNER JOIN professions ON MATCH(user_profession) AGAINST (?)';
SET #c_val = (SELECT prfs_profession FROM professions WHERE prfs_ID=1);
EXECUTE stat USING #c_val;

Related

How to return substring positions in LIKE query

I retrieve data from a MySQL database using a simple SELECT FROM WHERE LIKE case-insensitive query where I escape any % or _ in the like clause, so really the user can only perform basic text research and cannot mess up with regex because I then surround it myself with % in the LIKE clause.
For every row returned by this query, I have to search again using a JS script in order to find all the indexes of the substring in the original string. I dislike this method because I it's a different pattern matching than the one used by the LIKE query, I can't guarantee that the algorithm is the same.
I found MySQL functions POSITION or LOCATE that can achieve it, but they return only the first index if it was found or 0 if it was not found. Yes you can set the first index to search from, and by searching by passing the previously returned index as the first index until the new returned index is 0, you can find all indexes of the substring, but it means a lot of additional queries and it might end up slowing down my application a lot.
So I'm now wondering: Is there a way to have the LIKE query to return substring positions directly, but I didn't find any because I lack MySQL vocabulary yet (I'm a noob).
Simple answer: No.
Longer answer: MySQL has no syntax or mechanism ot return an array of anything -- from either a SELECT or even a Stored Procedure.
Maybe answer: You could write a Stored procedure that loops through one result, finding the results and packing them into a commalist. But I cringe at how messy that code would be. I would quickly decide to write JS code, as you have already done.
Moral of the story: SQL is is not a full language. It's great at storing and efficiently retrieving large sets of rows, but lousy at string manipulation or arrays (other than "rows").
Commalist
If you are actually searching a simple list of things separated by commas, then FIND_IN_SET() and SUBSTRING_INDEX() in MySQL closely match what JS can be done with its split (on comma) method on strings.

Extracting a value from an Array using mysql

I have a column that has brand names in an array format as below:
I want to extract information associated with Brand4 for example 'price'.
I tried using the below, but that's a psql query. How can I extract this information using MySQL in GCP.
SELECT Brand_name, price
FROM table_name
Where 'Brand4'=Any(Brand_name)
First, the explanation for your error message is that in MySQL, ANY() accepts a subquery, not just a single column or expression. See https://dev.mysql.com/doc/refman/8.0/en/any-in-some-subqueries.html
MySQL does not have an array type. Your Brand_name column is not an array, it's a string. It happens to contain commas and square brackets, but these are just characters in a string.
So your solutions are to use various string-search functions or expressions, as other folks have suggested.
The downside to all the string-search functions is that they cannot be optimized with a conventional index. So every search will be expensive, because it requires a table-scan.
Another solution I did not see yet is to use a fulltext index.
alter table brands add fulltext index (brand_name);
select * from brands
where match(brand_name) against ('Brand4' in boolean mode);
This may require some special handling if the brand names contain spaces or punctuation, but if they are plain words, it should work.
Read https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html to understand more about fulltext indexes.
The best solution would be to eliminate this fake "array" column by normalizing the schema to store one brand per row in another table. Then you can match strings exactly and optimize with a conventional index. But I understand you said that the table structure is not up to you.
This should work in MySQL (using a string function as mention here):
SELECT *
FROM brands
WHERE FIND_IN_SET('Brand4',brand_name);
see: DBFIDDLE
Provided SQL query will work in MySQL, if you will make a subquery within the parentheses, or use FIND_IN_SET instead of using ANY.
But, as stated in the MySQL documentation:
This function does not work properly if the first argument contains a
comma (,) character.
So, as an alternative, you could use LIKE (simple pattern matching).
Your SQL code then would be:
SELECT `brand_name`, `price`
FROM `test`
WHERE `brand_name` LIKE "%Brand4%"
See SQLFiddle for live example.
Also, you could use LOCATE.
Or any other alternative solution.
But, I must say that storing list data in the way you do, - it's not the best practice out there.
There are plenty of ways this can be done better.
For example, using M:M (many-to-many) relationship.
In case you made this design you really have to reconsider/redesign. Databases have there own data structures and sql is not an imparative language but a declaritve one.
If when you didnĀ“t desing you should consider create a table out of the one column. Perhaps this is what you try.
If it is just locating a specific string in the values of a field use like
SELECT Brand_name, price
FROM table_name
Where brand_anme like '%Brand4%'
But realize this is will not always yield accurate results.

How can you index a text search that has only one wildcard at the beginning of the search term ('%TERM')?

I found the following query in our MySQL slow query log:
SELECT target_status
FROM link_repository
WHERE target_url LIKE CONCAT('%', 'bundle/rpi_/activity/rpi_bridge/bridge_manual.pdf')
When I pointed this out to the developer manager in a conversation about slow page loads, he stated:
come on; concat() is a simple string concatenation and '%' is the wildcard in the search string. I know that searching strings is not the fastest of operations (that's why we have lucene-like engines, but this is trivial stuff)
There's about 18k rows in link_repository, which isn't much. The documentation I'm finding is that indexing on character strings doesn't work with wildcards. Is there an alternative strategy one can use?
In order for LIKE to use index it has to start with something. MySQL search from left to rigt. So if the string star with anything then MySQL will do a table scan and no index will work.
However, if you are using InnoDB tables you can try to use Full-Text Index.
You can add a Full-Text Index on the column, then you can use MATCH AGAINST function to find what you need then you can add RIGHT() clause to only give you the results that end with your string.
CREATE FULLTEXT INDEX target_url ON target_status(target_url);
Then you can query the records like so
SELECT target_status
FROM link_repository
WHERE MATCH ('bundle/rpi_/activity/rpi_bridge/bridge_manual.pdf') AGAINST(target_url) AND RIGHT(target_url, 49) = 'bundle/rpi_/activity/rpi_bridge/bridge_manual.pdf'

How can I make this SQL non sargable?

I've used an online tool to analyse one of my sql querys (The Query took me ages to make).
My query takes a word (in this example the word is 'dog.') and tries to find it in the 'qa' table when it does it joins row data from the login table where the login.pid===qa.u
SELECT login.pid,login.name,
qa.id,qa.end,qa.react,qa.win,qa.stock,qa.num,qa.ratio,qa.u,qa.t,qa.k,qa.swipes,qa.d
FROM login,qa WHERE login.pid=qa.u AND (qa.k LIKE '%dog.%' OR qa.k='.dog.')
ORDER BY qa.d DESC LIMIT 0,15
I understand what the tool is telling me:
Argument with leading wildcard
An argument has a leading wildcard character, such as "%foo". The predicate with
this argument is not sargable and cannot use an index if one exists.
but I don't know how to use an index inside the '()' without damaging or changing the results... could someone please explain how I could use an index in the middle of a query's conditions?
I take it that if this was non-sargable then the result would be faster?
First, learn to use modern join syntax:
SELECT login.pid, login.name,
qa.id, qa.end, qa.react, qa.win, qa.stock, qa.num, qa.ratio, qa.u, qa.t,qa.k, qa.swipes, qa.d
FROM login join
qa
on login.pid = qa.u
WHERE (qa.k LIKE '%dog.%' OR qa.k = '.dog.')
ORDER BY qa.d DESC
LIMIT 0,15;
Basically "sargable" means that you can use an index on a particular expression (it is not an English word, it is an acronym). The expression on qa.k cannot use an index.
This may not make a difference, depending on the query plan for the query. For instance, if the engine decides to scan the login table and then lookup values in qa, the index wouldn't help. It helps going the other way, though.
The bad news is that you cannot make this expression sargable in MySQL. The good news is that you can use a full text index to do what you want and possibly more. You can read about them here. One small note is that the default settings ignore short words, up to three letters. So you need to change the default setting if you actually want to search for "dog".
By the way, the following expression can use an index on qa.k:
WHERE (qa.k LIKE 'dog.%' OR qa.k = '.dog.')
(I'm not sure if MySQL actually would use the index, because it sometimes gets confused by or.)

MySQL Fulltext search but using LIKE

I'm recently doing some string searches from a table with about 50k strings in it, fairly large I'd say but not that big. I was doing some nested queries for a 'search within results' kinda thing. I was using LIKE statement to get a match of a searched keyword.
I came across MySQL's Full-Text search which I tried so I added a fulltext index to my str column. I'm aware that Full-text searches doesn't work on virtually created tables or even with Views so queries with sub-selects will not fit. I mentioned I was doing a nested queries, example is:
SELECT s2.id, s2.str
FROM
(
SELECT s1.id, s1.str
FROM
(
SELECT id, str
FROM strings
WHERE str LIKE '%term%'
) AS s1
WHERE s1.str LIKE '%another_term%'
) AS s2
WHERE s2.str LIKE '%a_much_deeper_term%';
This is actually not applied to any code yet, I was just doing some tests. Also, searching strings like this can be easily achieved by using Sphinx (performance wise) but let's consider Sphinx not being available and I want to know how this will work well in pure SQL query. Running this query on a table without Full-text added takes about 2.97 secs. (depends on the search term). However, running this query on a table with Full-text added to the str column finished in like 104ms which is fast (i think?).
My question is simple, is it valid to use LIKE or is it a good practice to use it at all in a table with Full-text added when normally we would use MATCH and AGAINST statements?
Thanks!
In this case you not neccessarily need subselects. You can siply use:
SELECT id, str
FROM item_strings
WHERE str LIKE '%term%'
AND str LIKE '%another_term%'
AND str LIKE '%a_much_deeper_term%'
... but also raises a good question: the order in which you are excluding the rows. I guess MySQL is smart enough to assume that the longest term will be the most restrictive, so starting with a_much_deeper_term it will eliminate most of the records then perform addtitional comparsion only on a few rows. - Contrary to this, if you start with term you will probably end up with many possible records then you have to compare them against the st of the terms.
The interesting part is that you can force the order in which the comparsion is made by using your original subselect example. This gives the opportunity to make a decision which term is the most restrictive based upon more han just the length, but for example:
the ratio of consonants a vowels
the longest chain of consonants of the word
the most used vowel in the word
...etc. You can also apply some heuristics based on the type of textual infomation you are handling.
Edit:
This is just a hunch but it could be possible to apply the LIKE to the words in the fulltext indexitself. Then match the rows against the index as if you have serched for full words.
I'm not sure if this is actually done, but it would be a smart thing to pull off by the MySQL people. Also note that this theory can only be used if all possible ocurrences arein fact in the fulltext search. For this you need that:
Your search pattern must be at least the size of the miimal word-length. (If you re searching for example %id% then it can be a part of a 3 letter word too, which is excluded by default form FULLTEXT index).
Your search pattern must not be a substring of any listed excluded word for example: and, of etc.
Your pattern must not contain any special characters.