MySQL FULLTEXT search yields 0 for a one-letter difference - mysql

I have a table, let's call it A, with a FULLTEXT index on its name field.
This table, containing around 1200 rows, contains a single row whose name field has a value of "TELEVISORI"
This query:
SELECT A.name, MATCH(name) AGAINST ('Televisori') AS `match`
FROM A
Results in (not the exact value, but I remember it being around 8):
+------------+-----------+
| name | match |
+------------+-----------+
| TELEVISORI | 8.3947893 |
+------------+-----------+
Whereas this one:
SELECT A.name, MATCH(name) AGAINST ('Televisore') AS `match`
FROM A
yields no results.
Things I've checked:
The word is not a stop word
Its length is over 4 characters (rather, 3 since I'm using InnoDB?)
The word appears in less than 50% of records - It's the only one in hundreds.
I tried changing the casing of the words in every possible combination just to be completely sure, but that shouldn't be it since I'm using a case insensitive collation. Also, it matches without a problem when I use "Televisori"
Is there anything I'm missing?

Related

MySQL Regex If string contains hyphen after 5 digits

So I have a slug column in my table and due to some bad coding, some of my slugs are messed up and need to be fixed.
I need to find all slugs that have a hyphen on both sides of exactly 5 digits, somewhere in the middle of the string.
So here's three samples of slugs:
321-sw-2nd-ave-portland-or-97204-2-3-4-5
321-sw-2nd-ave-portland-or-97204-2-3
430-e-25th-st-tacoma-wa-98421
My expression would match the first and second but not the third one.
I would like to then get rid of those extra things after the zip code.
Here's what I have tried so far, but my Regex skills are lacking big time.
^(.)*d{5}-(.)*$
You are attempting to match on the entire string. I would simply do a partial match on the part that you are interested in. Another problem with your regex is that you use d to represent a digit: MySQL wants \\d; also, this notation is only supported from 8.0 (in earlier versions, you need [0-9]).
Consider:
slug RLIKE '[0-9]{5}-'
Demo on DB Fiddle:
with t as (
select '321-sw-2nd-ave-portland-or-97204-2-3-4-5' slug
union all select '321-sw-2nd-ave-portland-or-97204-2-3'
union all select '430-e-25th-st-tacoma-wa-98421'
)
select slug from t where slug RLIKE '[0-9]{5}-'
| slug |
| :--------------------------------------- |
| 321-sw-2nd-ave-portland-or-97204-2-3-4-5 |
| 321-sw-2nd-ave-portland-or-97204-2-3 |

MySQL FULLTEXT query issue

I'm trying to query using mysql FULLTEXT, but unfortunately its returning empty results even the table contain those input keyword.
Table: user_skills:
+----+----------------------------------------------+
| id | skills |
+----+----------------------------------------------+
| 1 | Dance Performer,DJ,Entertainer,Event Planner |
| 2 | Animation,Camera Operator,Film Direction |
| 3 | DJ |
| 4 | Draftsman |
| 5 | Makeup Artist |
| 6 | DJ,Music Producer |
+----+----------------------------------------------+
Indexes:
Query:
SELECT id,skills
FROM user_skills
WHERE ( MATCH (skills) AGAINST ('+dj' IN BOOLEAN MODE))
Here once I run the above query none of the DJ rows are returning. In the table there are 3 rows with is having the value dj.
A full text index is the wrong approach for what you are trying to do. But, your specific issue is the minimum word length, which is either 3 or 4 (by default), depending on the ending. This is explained in the documentation, specifically here.
Once you reset the value, you will need to recreate the index.
I suspect you are trying to be clever. You have probably heard the advice "don't store lists of things in delimited strings". But you instead countered "ah, but I can use a full text index". You can, although you will find that more complex queries do not optimize very well.
Just do it right. Create the association table user_skills with one row per user and per skill that the user has. You will find it easier to use in queries, to prevent duplicates, to optimize queries, and so on.
Your search term is to short
as in mysql doc
Some words are ignored in full-text searches:
Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is three characters for
InnoDB search indexes, or four characters for MyISAM. You can control
the cutoff by setting a configuration option before creating the
index: innodb_ft_min_token_size configuration option for InnoDB search
indexes, or ft_min_word_len for MyISAM.
.
Boolean full-text searches have these characteristics:
They do not use the 50% threshold.
They do not automatically sort rows in order of decreasing relevance.
You can see this from the preceding query result: The row with the
highest relevance is the one that contains “MySQL” twice, but it is
listed last, not first.
They can work even without a FULLTEXT index, although a search
executed in this fashion would be quite slow.
The minimum and maximum word length full-text parameters apply.
https://dev.mysql.com/doc/refman/5.6/en/fulltext-natural-language.html
https://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html

Store multiple values in a single cell instead of in different rows

Is there a way I can store multiple values in a single cell instead of different rows, and search for them?
Can I do:
pId | available
1 | US,UK,CA,SE
2 | US,SE
Instead of:
pId | available
1 | US
1 | UK
1 | CA
1 | SE
Then do:
select pId from table where available = 'US'
You can do that, but it makes the query inefficient. You can look for a substring in the field, but that means that the query can't make use of any index, which is a big performance issue when you have many rows in your table.
This is how you would use it in your special case with two character codes:
select pId from table where find_in_set('US', available)
Keeping the values in separate records makes every operation where you use the values, like filtering and joining, more efficient.
you can use the like operator to get the result
Select pid from table where available like '%US%'

Boolean Full-Text Search Exclude Phrase AB-CD, e.g. -"AB-CD"?

I have a table that is populated with certain values e.g.
| CODE | NAME | NB: THIS IS A VERY BASIC EXAMPLE
| zygnc | oscar alpha |
| ab-cd | delta tiger |
| fsdys | delta bravo |
Using MySQL Full-Text Boolean search i would like to search this table for all names containing 'delta' but exclude the first result basic on its unique code 'ab-cd'. This code contains a minus sign and this is a requirement and removing this would not be possible.
So the following query 'should' in my mind accommodate for this:
SELECT code, name
FROM items
WHERE MATCH (code, name)
AGAINST ('delta -"ab-cd"' IN BOOLEAN MODE)
However, running this query does not product the desired result in that the result does still contain the row with the code that is meant to be excluded, 'ab-cd'.
The Coalition of these two tables are set to utf8_bin.
The ft_min_word_len value is set to 4.
Could someone possibly suggest a reason for this behavior, I assume that it treats the string possibly as two separate values, e.g. "-ab", "-cd" and as the ft_min_word_len value is 4, neither of these two strings can produce any result?
I would think that the use of the encapsulation "", would mean that the second minus sign would be treated as literal but it seems that this is not the case. Perhaps it has something to do with the table coalition that i am not aware of?
In any case, any suggestions/advice/input/feedback/direction would be greatly appreciated, thank you!!
You need to change the value of variable ft_min_word_len in my.cnf file.
By default ft_min_word_len value is 3. Once change the variable, you need to restart the server.
Here "ab-cd" treated as two words as "ab" and "cd". So minimum word length is not matched with the words.

Database reports different character lengths for the same word

I have a table in which I added english dictionary words. Now I have some records that seem to be duplicates but the length of the string differs.
for example 'aaron' is repeated twice in my table, but when I use this query:
select id, word, char_length(word) from my_table;
I get the following back:
id | word | char_length
7 | aaron | 5
12 | aaron | 6
How can the char_length change for the same word? What can I do to remove one word which exceeds length by 1?
It's likely #Vatev is on the right track with his comment.
Try these two queries:
1. SELECT * FROM my_table WHERE word = 'aaron';
2. SELECT * FROM my_table WHERE word like '%aaron%';
The first will only match rows where word is exactly aaron while the second will match rows which contain aaron anywhere. If there are rows with extra content, like whitespace, they would show up in the second, but not the first.
One possible way to clean up these duplicates would be to run the following:
DELETE FROM my_table WHERE TRIM(word) != word;
But don't run that blindly - it will delete all rows with extra whitespace, even if there isn't a matching "correct" entry.