MySQL order by "best match" - mysql

I have a table that contains words and an input field to search that table using a live search. Currently, I use the following query to search the table:
SELECT word FROM words WHERE word LIKE '%searchstring%' ORDER BY word ASC
Is there a way to order the results so that the ones where the string is found at the beginning of the word come first and those where the string appears later in the word come last?
An example: searching for 'hab' currently returns
a lphabet
h abit
r ehab
but I'd like it this way:
hab it (first because 'hab' is the beginning)
alp hab et (second because 'hab' is in the middle of the word)
re hab (last because 'hab' is at the end of the word)
or at least this way:
hab it (first because 'hab' is the beginning)
re hab (second because 'hab' starts at the third letter)
alp hab et (last because 'hab' starts latest, at the fourth letter)
Would be great if anyone could help me out with this!

To do it the first way (starts word, in the middle of the word, ends word), try something like this:
SELECT word
FROM words
WHERE word LIKE '%searchstring%'
ORDER BY
CASE
WHEN word LIKE 'searchstring%' THEN 1
WHEN word LIKE '%searchstring' THEN 3
ELSE 2
END
To do it the second way (position of the matched string), use the LOCATE function:
SELECT word
FROM words
WHERE word LIKE '%searchstring%'
ORDER BY LOCATE('searchstring', word)
You may also want a tie-breaker in case, for example, more than one word starts with hab. To do that, I'd suggest:
SELECT word
FROM words
WHERE word LIKE '%searchstring%'
ORDER BY <whatever>, word
In the case of multiple words starting with hab, the words starting with hab will be grouped together and sorted alphabetically.

Try this way:
SELECT word
FROM words
WHERE word LIKE '%searchstring%'
ORDER BY CASE WHEN word = 'searchstring' THEN 0
WHEN word LIKE 'searchstring%' THEN 1
WHEN word LIKE '%searchstring%' THEN 2
WHEN word LIKE '%searchstring' THEN 3
ELSE 4
END, word ASC

You could use the INSTR function to return the starting position of the search string within the word,
ORDER BY INSTR(word,searchstring)
To make the resultset more deterministic when the searchstring appears in the same position in two different words, add a second expression to the ORDER BY:
ORDER BY INSTR(word,searchstring), word
(For example, searchstring hab appears in second position of both chablis and shabby)

In your case it would be:
ORDER BY INSTR(word, '%searchstring%')
INSTR search in the word column for '%searchstring%' and return it's position, if no match then it will be 0 and cause result go down in order.
You also can add DESC for changing direction, eg:
ORDER BY INSTR(word, '%searchstring%') DESC

I got the best match for multiple columns using this query:
SELECT
id,
wordA,
wordB,
(CASE
WHEN wordA = 'keywordA%' THEN 0
WHEN wordA LIKE 'keywordA%' THEN 1
WHEN wordA LIKE '%keywordA%' THEN 2
WHEN wordA LIKE '%keywordA' THEN 3
ELSE 4
END) AS 'wordA_keywordA_score',
(CASE
WHEN wordA = 'keywordB%' THEN 0
WHEN wordA LIKE 'keywordB%' THEN 1
WHEN wordA LIKE '%keywordB%' THEN 2
WHEN wordA LIKE '%keywordB' THEN 3
ELSE 4
END) AS 'wordA_keywordB_score',
(CASE
WHEN wordB = 'keywordA%' THEN 0
WHEN wordB LIKE 'keywordA%' THEN 1
WHEN wordB LIKE '%keywordA%' THEN 2
WHEN wordB LIKE '%keywordA' THEN 3
ELSE 4
END) AS 'wordB_keywordA_score',
(CASE
WHEN wordB = 'keywordB%' THEN 0
WHEN wordB LIKE 'keywordB%' THEN 1
WHEN wordB LIKE '%keywordB%' THEN 2
WHEN wordB LIKE '%keywordB' THEN 3
ELSE 4
END) AS 'wordB_keywordB_score'
FROM
words
WHERE
wordA like '%keywordA%'
OR
wordA like '%keywordB%'
OR
wordB like '%keywordA%'
OR
wordB like '%keywordB%'
ORDER BY
(
wordA_keywordA_score +
wordA_keywordB_score +
wordB_keywordA_score +
wordB_keywordB_score
)
ASC;

Related

How to query for a phrase on SQL database of words?

I am using MySQL and I have an SQL database of of songs with a table that consists of 8 columns of information on words of a song. each row represents a single word from the songs lyrics:
songSerial - the serial number of the song
songName - the song name
word - a single word from the song's lyrics
row_number - the number of the row that the word is found
word_position_in_row - the number of the word in the row alone
house_number - the number of the house the word belongs to
house_row - the number of the row in the house that the word is found in
word_number - the number of the word out of all the songs lyrics
example for a row: { 4 , The Scientist , secrets , 8 , 4 , 2 , 1 , 37 }
Now I want to query all the songs that contains a group of words. For instance all the words that have the sentence: "I Love You" in them. It must be in that order and not from different rows or houses.
Here are scripts in my oneDrive for creating the databastable and about 400 rows:
TwoTextScriptFilesAndTheirZip
Can anyone help ?
Thank you
One method is to use joins:
select s.*
from songwords sw1 join
songwords sw2
on sw2.songSerial = sw1.songSerial and
sw2.word_number = sw1.word_number + 1 join
songwords sw3
on sw3.songSerial = sw2.songSerial and
sw3.word_number = sw2.word_number + 1
where sw1.word = 'I' and sw2.word = 'love' and sw3.word = 'you';
Or, if you prefer:
where concat_ws(' ', sw1.word, sw2.word, sw3.word) = 'I love you'
This is worse from an optimization perspective (indexes using word do not help performance), but it is clear what the query is doing.
Searches of this type suggest using a full text index. The only caveat is that you will need to remove the stop word list and index all words, regardless of length. ("I" and "you" are typical examples of stop words.)
This is an expensive approach for a large table, assuming word is not null, we could do something like this:
SET group_concat_max_len = 16777216 ;
SELECT t.song_serial
, t.house_number
, t.row_number
FROM mytable t
GROUP
BY t.songserial
, t.house_number
, t.row_number
HAVING CONCAT(' ',GROUP_CONCAT(t.word ORDER BY t.word_position_by_row),' ')
LIKE CONCAT('% ','I love you',' %')
We would definitely want a suitable index available, e.g.
... ON `mytable` (`songserial`,`house_number`,`row_number`,`word`)
If one of the words in the phrase is infrequent, we might be able to optimize a bit with a search for that infrequent word first, and then get all of the words on the same row ...
SELECT t.song_serial
, t.house_number
, t.row_number
FROM ( SELECT r.songserial
, r.house_number
, r.row_number
FROM mytable r
WHERE r.word = 'love'
GROUP
BY r.word
, r.songserial
, r.house_number
, r.row_number
) s
JOIN mytable t
ON t.songserial = s.songserial
AND t.house_number = s.house_number
AND t.row_number = s.row_number
GROUP
BY t.songserial
, t.house_number
, t.row_number
HAVING CONCAT(' ',GROUP_CONCAT(t.word ORDER BY t.word_position_by_row),' ')
LIKE CONCAT('% ','I love you',' %')
That inline view s would benefit from a covering index with word as the leading column
... ON `mytable` (`word`,`songserial`,`house_number`,`row_number`)
You look for these words and relative search positions: 1 = I, 2 = love, 3 = you. Let's compare them with two song lines:
And I love, love, love you
real pos: 1 2 3 4 5 6
search pos: - 1 2 2 2 3
diff: - 1 1 2 3 3
I miss you and I love you
real pos: 1 2 3 4 5 6 7
search pos: 1 - 3 - 1 2 3
diff: 0 - 0 - 4 4 4
If we look at the position deltas of the first line, we get 1 (twice), 2 (once), and 3 (twice).
For the second line we get deltas 0 (twice), and 4 (thrice).
So for the second song line we find a delta with as many matches as search words, for the first line not. The second line is a match.
And here is the query. I assume we have a temporary table search filled with the search words and relative positions for readability.
select distinct w.songserial, w.songname, w.house_number
from words w
join search s on s.word = w.word
group by
w.songserial, w.songname, w.row_number, w.house_number, w.house_row, -- song line
w.word_position_in_row - s.pos -- delta
having count(*) = (select count(*) from search);
This query is based on:
a song is identified by songserial + songname + house_number
a song line is identified by songserial + songname + row_number + house_number + house_row
This may be wrong; I don't know what house and house number mean in reference to a song. But that'll be easy to adjust.

mysql search words in text column sorted by # world matched

Problem:
the text input will be 3 or 4 words,
i want show the field who contain at least one of these words.
for example if the words are "alpha bravo charlie delta" i want allow the results
CHARLIE BRAVO
my name is CHARLIE
what is ALPHAness
ALPHA and DELTA
adDELTAs
BRAVO
DELTA and ALPHA and BRAVO
bbbBRAVOooo CHARLIEeeee
no problem till here, i use the query:
select * from subject where name like '%alpha%'
or name like '%bravo%' or name like '%charlie%'
or name like '%delta%
but i want show the results in a particular ORDER,
the results are more relevant when
more words occurance more relevante result should be,
so "CHARLIE BRAVO" shows up before "BRAVO"
i found a solution for that
select *
, (
(char_length(col1) - char_length(replace(col1,'alpha','')))
/ char_length('alpha')
+
(char_length(col1) - char_length(replace(col1,'bravo','')))
/ char_length('bravo')
+
(char_length(col1) - char_length(replace(col1,'delta','')))
/ char_length('delta')
+
(char_length(col1) - char_length(replace(col1,'charlie','')))
/ char_length('charlie')
) as Occurances
from YourTable
order by
Occurances desc
but i need other order rules:
if the record begin with a searched word is more relevant es."ALPHA and..."
if a word in the record begin with a searched word is more relevant es."what is ALPHAness"
searched word inside the record es."adDELTAs"
i find a solution for these order problem too but,
HOW TO combine both?
select id, name
from subjects
where name like '%alpha%'
order by
name like 'alpha%' desc,
ifnull(nullif(instr(name, ' alpha'), 0), 99999),
ifnull(nullif(instr(name, 'alpha'), 0), 99999),
name;
so to conclude if I search "alpha bravo" the results should be:
DELTA and ALPHA and BRAVO (contain both words so is the first)
ALPHA and DELTA (begin with the first word searched)
BRAVO (begin with the second word searched)
what is ALPHAness (has the first word searched as begin of a word)
CHARLIE BRAVO (has the second word searched as begin of a word)
bbbBRAVOooo charlieeee (has the second word searched inside)
PS i need case insensitive and without distinction with accented letters òàùèìé so è = e
Looks like you need a stored function to calculate the weight to be used in the ordering.
E.g. initially weight is 0.
If a word is found in the field weight+=1000
If the word is from the beginning of the record weight+=100
If the word is from the beginning of the word weight+=10
weight+=(amount of the words - word index) order of the word
So passing search "alpha bravo" returns
1000+10+1 + 1000+10 DELTA and ALPHA and BRAVO (contain both words so is the first)
1000+100+1 ALPHA and DELTA (begin with the first word searched)
1000+100 BRAVO (begin with the second word searched)
1000+10+1 what is ALPHAness (has the first word searched as begin of a word)
1000+10 CHARLIE BRAVO (has the second word searched as begin of a word)
1000 bbbBRAVOooo charlieeee (has the second word searched inside)
i find this solution, is not elegant.. but it's works
select *
,
(
(10*(col1 like 'alpha%'))+
(8*(col1 like '% alpha%'))+
(3*(col1 like '%alpha%' and col1 not like "% alpha%" and col1 not like 'alpha%'))+
(9*(col1 like 'bravo%'))+
(7*(col1 like '% bravo%'))+
(3*(col1 like '%bravo%' and col1 not like "% bravo%" and col1 not like 'bravo%'))
) as score
from YourTable where col1 like '%alpha%' or col1 like '%bravo%'
order by
score desc
http://sqlfiddle.com/#!9/91971/4

Mysql searching multiple words using wildcard like and ordering by count of words matched

I have an innodb table 'not supporting fulltext' that i search using like statments
Select text,id from searchTable where text like '%sub1%' or text like '%sub2%' or text like '%sub3%'
group by text,
order by (CASE
when text LIKE 'sub1 %' then 1
when text LIKE 'sub1%' then 2
when text LIKE '% sub1%' then 3
when text LIKE '%sub1%' then 4
else 5
end),id
this returns results more or less as expected, yet i was wondering if i can also order it by the count of substrings that matched. for example order rows that has all 3 substrings first, followed by rows that matchs 2 out of 3 substrings, etc..
is it possible ? would it impact performance that much ?
my table contains around 50k rows, this current statement takes around 0.6ms to execute.
thanks
You can order by the number of matching substrings by doing:
order by ((test like '%sub1%') +
(text like '%sub2%') +
(text like '%sub3%')) desc
In an integer context, MySQL treats boolean values as integers, with 1 for true and 0 for false. So this counts the number of matches.

finding a number in space separated list with REGEXP

I am writing a SQL query to select row, where a field with space separated numbers contains a single number, in this example the 1.
Example fields:
"1 2 3 4 5 11 12 21" - match, contains number one
"2 3 4 6 11 101" - no match, does not contain number one
The best query so far is:
$sql = "SELECT * from " . $table . " WHERE brands REGEXP '[/^1$/]' ORDER BY name ASC;";
Problem is that this REGEXP also finds 11 a match
I read many suggestions on other post, for instance [\d]{1}, but the result always is the same.
Is it possible to accomplish what I want, and how?
You don't need regex: You can use LIKE if you add a space to the front and back of the column:
SELECT * from $table
WHERE CONCAT(' ', brands, ' ') LIKE '% 1 %'
ORDER BY name
Try:
WHERE brands REGEXP '[[:<:]]1[[:>:]]'
[[:<:]] and [[:>:]] match word boundaries before and after a word.
Why not FIND_IN_SET() + REPLACE() ?
SELECT
*
FROM
`table`
WHERE
FIND_IN_SET(1, REPLACE(`brands`, ' ', ','))
ORDER BY
`name` ASC;

A "Search" mysql query issue

I have this query where I can search the TABLE_GLOBAL_PRODUCTS
$catglobal_sql = "
select p.*,
case when
p.specials_new_products_price >= 0.0000
and p.expires_date > Now()
and p.status != 0
then p.specials_new_products_price
else p.products_price
end price
from ".TABLE_GLOBAL_PRODUCTS." p
INNER JOIN ".TABLE_STORES." s ON s.blog_id = p.blog_id
where
MATCH (p.products_name,p.products_description) AGAINST ('%".$search_key."%')
OR p.products_name like '%".$search_key."%'
order by p.products_date_added DESC, p.products_name";
The issue here is that, when I search with phrases like Cotton Shirts it displays correct results. However, when I only input a single word like Cotton it displays no results instead of displaying the same as when you input a phrase like Cotton Shirts.
Use * instead of % as a wildcard when using MATCH ... AGAINST ...
So the match part of your code should look like:
...
MATCH (p.products_name,p.products_description) AGAINST ('*".$search_key."*')
...
In MATCH the wildcards are slightly different
To match zero or more characters use
In MATCH '*'
In LIKE '%'
To match any single character use
In MATCH '?'
In LIKE '_'