MySQL search text column for a specific word - mysql

What I would like to do is search for a specificity word in my table like so:
SELECT * FROM my_table WHERE MATCH (description) AGAINST ('test');
And this will return all the rows in my_table that contains the string 'test'.
The problem is this dose not always happen. Meaning it will work sometimes after I have recreated the table.
CREATE TABLE my_table
(date date,
time time,
name varchar(50),
description text,
FULLTEXT INDEX (description)) ENGINE=MYISAM;
Because it only happened sometimes it has led me to believe that there must be some problem with the indexing.
So I tried setting the ft_min_word_len to 1 that did nothing.
My MySQLVersion is 5.1.57-community.
Thank you for your help.

match and against is a natural language search. if your column contains 50% or more of what you pass in the against() parameter, it will be considered common and would not return anything. just use SELECT * FROM my_table WHERE DESCRIPTION LIKE '%test%' where the % wildcard depicts that there is a word or phrase or suffix/prefix before or after.
here read http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html to learn more

Related

Search table with 30 million records

I have sample mysql database with only 1 table ( InnoDB) that has the following columns
Id int PK
Description TEXT
table has more than 30 million records and description field has length up to 1000 character.
What is the most efficient way to make a search for some records in this table?
i need for example description that starts with / end with / contains.
When i run a query like
SELECT * FROM tbl WHERE Description like '%abc'
it takes too long time because the like operator scan all table records
I google and found that there is something called Full TEXT Index
I add the index using the following command
ALTER TABLE tbl ADD FULLTEXT INDEX `DescriptionIndex` (`Description` ASC)
they when i try to execute a query like this
SELECT * FROM tbl WHERE MATCH (`Description`) AGAINST ('"The sea is awesome"')
some times it takes long time and other time works good based on the value in against parameter , i could not identify the problem
I need to know if i miss something or there is better way to implement search.

MySql Indexing part of a column

I need to search a medium sized MySql table (about 15 million records).
My query searches for a value ending with another value, for example:
SELECT * FROM {tableName} WHERE {column} LIKE '%{value}'
{value} is always 7 characters length.
{column} is sometimes 8 characters length (otherwise it is 7).
Is there a way to improve performence on my search?
clearly index is not an option.
I could save {column} values in reverse order on another column and index that column, but im looking to avoid this solution.
{value} is always 7 characters length
Your data is not mormalized. Fixing this is the way to fix the problem. Anything else is a hack. Having said that I accept it is not always proactical to repair damage done in the past by dummies.
However the most appropriate hack depends on a whole lot of information you've not told us about.
how frequently you will run the query
what the format of the composite data is
but im looking to avoid this solution.
Why? It's a reasonable way to address the problem. The only downside is that you need to maintain the new attribute - given that this data domain appears in different attributes in multiple (another normalization violation) means it would make more sense to implement the index in a seperate, EAV relation but you just need to add triggers on the original table to maintain sync using your existing code base. Every solution I can think will likely require a similar fix.
Here's a simplified example (no multiple attributes) to get you started:
CREATE TABLE lookup (
table_name VARCHAR(18) NOT NULL,
record_id INT NOT NULL, /* or whatever */
suffix VARCHAR(7),
PRIMARY KEY (table_name, record_id),
INDEX (suffix, table_name, record_id)
);
CREATE TRIGGER insert_suffix AFTER INSERT ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, SUBSTR(NEW.attribute, NEW.id, RIGHT(NEW.attribute, 7
);
CREATE TRIGGER insert_suffix AFTER UPDATE ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, RIGHT(NEW.attribute, 7)
);
CREATE TRIGGER insert_suffix AFTER DELETE ON yourtable
FOR EACH ROW
DELETE FROM lookup WHERE table_name='yourtable' AND record_id=OLD.id
;
If you have a set number of options for the first character, then you can use in. For instance:
where column in ('{value}', '0{value}', '1{value}', . . . )
This allows MySQL to use an index on the column.
Unfortunately, with a wildcard at the beginning of the pattern, it is hard to use an index. Is it possible to store the first character in another column?

MySQL: Why I get no results for a specific query on fulltext index?

I have some trouble with a MySQL table and a fulltext index. My table structure looks similar to this:
CREATE TABLE example_index (
id int(11) NOT NULL auto_increment,
title tinytext,
content text,
FULLTEXT INDEX title (title),
FULLTEXT INDEX titlecontent (title,content),
PRIMARY KEY (id)
)ENGINE=MyISAM;
As you can see I have created a fulltext index for the fields title and title and content in combination. Furthermore I have stored some data into this table like this:
id | title | content
1 | Teamwork | Example data
2 | CSV-Export | Testdata
If I try to get some data from this table via a match against query then I got one result for the following query:
SELECT *
FROM example_index
WHERE MATCH (title) AGAINST('csv' IN BOOLEAN MODE);
BUT no results for this query:
SELECT *
FROM example_index
WHERE MATCH (title) AGAINST('Team' IN BOOLEAN MODE);
Can someone tell me, why I got no results here?
MySQL fulltext indexes index words. This is a very important limitation. You do not have the word team in your data, but you do have the word csv because - is considered a word delimiter.
If you are looking for words starting with team then you can still use the fulltext index using the * operator within the string to be searched:
SELECT *
FROM example_index
WHERE MATCH (title) AGAINST('Team*' IN BOOLEAN MODE);
If you would like to get records that contain the substring team anywhere within the indexed field, then MySQL fulltext index and fulltext search cannot help you. In this case you either need to revert to the good old title like '%team%' or you need to use a different fulltext search provider that can use MySQL.
Full text search looks for words in the text not partial words. Because your text contains "Teamwork", it does not contain team. This is a limitation of many full text implementations.
If your data is not too big, you can use like:
SELECT *
FROM example_index
WHERE title LIKE '%Team%';
If you are just looking for titles that start with "Team", you can do:
SELECT *
FROM example_index
WHERE title LIKE 'Team%';
The advantage of this approach is that it will take advantage of a regular index on example_index(title).
I notice that MySQL now has an n-gram parser. This parser is designed for non-English (primarily east Asian) languages. However, it might work on English as well. You might be able to try this to do what you want.

SQL using "in" for looking up substrings

So, we know this one works when I want to select all ID's that are present in the inner sql statement
Select *
FROM TableA
WHERE Column1 IN (SELECT column2 FROM tableB WHERE = condition)
What kind of syntax do I need to do if Column1 is a long string and I need to check if a certain substring exists.
Ex Column1 = "text text text text 12345" where 12345 is an ID that is present in the list of ID's given by the inner sql statement
Basically I'm trying to detect if an ID is present in one of strings from another table based on my list of ID's from another table.
Should I do this in SQL or let a serverside code do it?
This is usually done using the LIKE operator:
SELECT ... FROM ... WHERE Column1 LIKE "%12345%";
However this is extremely slow, since it is based on substring matching. To improve performance you have to create a search index table storing single words. Such index typically is maintained by trigger definitions: whenever an entry is changed the trigger also changes the set of words extracted into the search index table. Searching in such an index table is obviously fast and can be combined with the original table by means of a JOIN based on the n:1 relationship between words in the index to the original entries in your table.
Instead of using fieldname like '%needle%' search, which is extremely slow because it cannot utilise indexes, create a fulltext index on the given column and use fulltext search to find the matching substring.
Below code excerpt is quoted from the MySQL documentation:
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
) ENGINE=InnoDB;
SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' IN NATURAL LANGUAGE MODE);
The catch with syntax is that the list of words being looked for ('database' in the above code example) must be a string literal, it cannot be a subquery. You need to assemble the list of keywords in the application that calls the sql statement.

How to do a CONTAINS() on two columns of Full Text Index Search SQL

I have a table (MyTable) with the following columns:
Col1: NameID VARCHAR(50) PRIMARY KEY NOT NULL
Col2: Address VARCHAR(255)
Data Example:
Name: '1 24'
Address: '1234 Main St.'
and i did a full text index on the table after making the catalog using default params.
How can I achieve the following query:
SELECT * FROM MyTable
WHERE CONTAINS(NameID, '1')
AND CONTAINS(Address, 'Main St.');
But my query is returning no results, which doesn't make sense because this does work:
SELECT * FROM MyTable
WHERE CONTAINS(Address, 'Main St.');
and so does this:
SELECT * FROM MyTable
WHERE CONTAINS(Address, 'Main St.')
AND NameID LIKE '1%'
but this also doesn't work:
SELECT * FROM MyTable
WHERE CONTAINS(NameID, '1');
Why can't I query on the indexed, primary key column (Name) when I selected this column to be included with
the Address column when setting up the Full Text Index?
Thanks in advance!
Since the NameID field is of type varchar, full-text will handle the indexing just fine.
The reasoning behind CONTAINS(NameID, '1') not returning any search results is that '1' (and other such small numbers) are regarded as noise words by full-text and filtered out during indexing time.
To get a list of the stop words, run the following query -
select * from sys.fulltext_system_stopwords where language_id = 1033;
You need to turn off or modify the stop list, an example of which can be found here.
I think the biggest problem here (and I edited my question to reflect this) is that I've got integers representing the primary key's name, which Contains() function on the full text catalog is not compatible. This is unfortunate and I'm still searching for a full text alternative to working with catalogs of integers.