I am quering a table (about 150,000 rows and growing) with a big varchar field (size 2000) which can't be indexed (and there's no point even if it could be). I am using Sql Server 2008.
The query I used till now was:
select * from tbl_name
where field_name like '%bla bla%'
("bla bla" is according to what the user searched for)
In order to improve performence, I wann'a start using the Full-Text Search feature (already defined a catalog and a text index on this field).
I am a bit confused from what I read about quering with this option.
what query should I use in order to get exactly the same results as the query I used to use before?
Comments:
I would like to get results which are not case sensative, as it worked before (meaning if the user searches for "LG" he will also get results that contains "Lg").
If user enters "Sams" he will also get "Samsung".
Thanks!
Eran.
CONTAINS() will get you the LIKE() functionality you are seeking with one exception - I noticed in the comments that you also want to match the second entry - "hhhEranttt". Unfortunately, due to the lack of suffix search this is currently not possible.
For the other entries you can run a prefix search - CONTAINS(field_name, '"eran*"') which matches all the other entries since full-text searches are case-insensitive.
HTH.
Related
This must be a niche scenario since I have not been able to find a similar question around and in my brief testing in my SQL workbench just using the string in place of the column name did not work.
eg:
SELECT MATCH ('fork') AGAINST ('user entered text about forks' IN NATURAL LANGUAGE MODE);
Doesn't work...
I have a query that returns matches on a full text index with the relevance score as one of the columns returned. In this app, I am looking for "search suggestions" in a suggestions table that is built off the websites search index content. The user side also stores everything they search for in their local browser storage.
Currently, I have front end code that uses regex to pull matches from their local storage search history (up to 5) and then sends what they typed (as they type) to the back end to get the best matches from the suggestions table.
The way it works now, is the (up to 5) history matches are shown first, then the rest are filled in up to 10 total matches from the back end. What I would prefer, is that I send the history matches to the back end and include them in the FT match query in some way so that the result set contains all matched suggestions from the table + the history matches sent from the front end, but all sorted by the full text match relevance score to get them all in order of relevance. The new way may result in no history matches showing or it might result in more than 5 history matches showing, it would all boil down the releveance score.
Is something like this possible? The only other way I could image doing this is somehow creating a temporary table with a full text index, on the fly, and then joining that table in my current query, then removing the temp table when its done. The problem with that, in my mind, is that this is all happening in real time as the user types so I don't want to add something like that if its going to bog down the response time. Is there a fast/optimal way of doing this? Is there a way that would also remove the temporary table when the query ends?
Or is there some other command that can just give me a score based on string value against what the user typed in like what I tried above?
EDIT:
It looks like my temporary table idea could work:
https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html
I'll just have to see what kind of perforamce impact this has. Im still interested to hear thoughts on if this is the best / only way or if there is a better one.
The CREATE TEMPORARY TABLE route was the way to go here. I tested it out and its working.
Worthy of note to future travelers. I had to switch my main table from innodb to myisam for this to work. I was able to mix/match the myisam temp table with the innodb main table, but the scoring algorithms are different so the innodb matches were taking priority due to higher scores. This was not an issue for me as I really did not need / use transactions for the primary suggestions table so I just made them both MyISAM engines.
Another item of note, is that I had to switch to splitting the user's query into "words" and ecapsulating them in "*" and running the match as a boolean search instead of natural language becausae in the case of the temp table, a user would likely have entered similar searches which would mean most of the words were in more than 50% of the rows so no matches were returning. Boolean search works around this. Again, not a big deal for my particular use case.
Had I needed to stay in innodb for this, it would have been a problem because from what I can tell, there is no way to set a full text index on an innodb temporary table.
However I found some threads about this, but nothing fits to my case.
I have a search field in my mobile app, where after text change, the real time search is running via calling my API.
The search request starts only if there are 3 or more characters entered and is searching ONLY in 1 DB column, called TITLE. So each time the user enters a letter, a query is searching for it.
Currently I have it like this (I know this solution is very bad). $searchedword is the word user entered:
if (!empty($searchedword)&&strlen($searchedword)>2 ) {$searchedword=strtolower($searchedword);
$sql = "SELECT * FROM TABLE ";$result = $mysqli->query($sql); $output='';
if ($result->num_rows > 0) {
while($data=$result->fetch_array()) {
$title=strtolower($data['title']);$content=$data['content'];
if (strpos($title,$searchedword) !== false ) {$output.=$title.','.$content;}
}}
So this just checks, if the title from DB contains the searched word. This works very well, but I think it is very bad according to performance, because each time the user enters a letter to the search field, each time all the data from the table are queried and looked for that word.
I want to recreate my code to meet the best performance.
So my first question is, should I add a FULLTEXT INDEX to the TITLE column in DB, will it help or will it just increase the disk space? As I am just searching against 1 column and in this column is just a title (1 or 2 words max).
And second question, what should be the best query for my case and of course with the best performance? As I need to search after each letter which user enters.
Can I use the search this way?
SELECT * FROM TABLE WHERE MATCH (title) AGAINST ('$searchedword' IN NATURAL LANGUAGE MODE)
However it seems, this will return only if the word completely matches the title, but returns nothing when the word is part of the title, so it is not a good solution.
The only solution which works is this:
SELECT * FROM TABLE WHERE title LIKE '%$searchedword%' "
but what about performance? And I don't understand how this works, because searchedword are converted to lowercase and I have removed the accents from that word, and the TITLE column in DB has accents and also Uppercase, but this search works very well!
If your title column has a collation like utfmb4_general_ci, you don't have to worry about dealing with upper case, lower case, and diacritical marks in your MySQL WHERE clauses. MySQL will do it for you. It is really good at handling character sets and collations in all kinds of languages. (Such things are very helpful to Swedish-language users, and the inventors of MySQL are Swedish.)
FULLTEXT with NATURAL LANGUAGE MODE is probably not the right approach for this application. It works on words, not chunks of letters. So it probably won't give you anything until your user has typed a whole word, and not a stop word. And, it is a little squirrely when you search a table with only a few rows. So, that might be a problem if you're just getting started.
It does order the results by the closeness of the match, so the most likely hit is the first one. So, if you know you have a phrase to search, it's good.
For your progressive-search application you may want to use one of these two LIKE queries.
SELECT title FROM tbl WHERE title LIKE CONCAT('$searchedword', '%') /*insecure*/
or this one which is much slower but finds your partial match anywhere in the title, not just at the beginning.
SELECT title FROM tbl WHERE title LIKE CONCAT('%', '$searchedword', '%') /*insecure*/
Avoid running these queries until you have gathered at least a few letters from your user, otherwise you'll get absurdly many results.
In these cases say SELECT title not SELECT *, and create an ordinary index on the title column. That way MySQL can satisfy the whole query from the index, which will make it much faster.
And, use MySQL's WHERE functionality to do the matching. Don't fetch the whole table from MySQL and search it in your php program.
And, use prepared statements. Because cybercreeps.
I have strings like the following in my VARCHAR InnoDB table column:
"This is a {{aaaa->bbb->cccc}} and that is a {{dddd}}!"
Now, I'd like to search for e.g. {{xxx->yyy->zzz}}. Brackets are part of the string. Sometimes searched together with another colum, but which only contains an ordinary id and hence don't need to be considered (I guess).
I know I can use LIKE or REGEXP. But these (already tried) ways are too slow. Can I introduce a fulltext index? Or should I add another helping table? Should I replace the special characters {, }, -, > to get words for the fulltext search? Or what else could I do?
The search works with some ten-thousand rows and I assume that I often get about one hundred hits.
This link should give you all the info you need regarding FULLTEXT indexes in MySQL.
MySQL dev site
The section that you will want to pay particular attention to is:
"Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row."
So in short, to answer your question you should see an improvement in query execution times by implementing a full text index on wide VARCHAR columns. Providing you are using a compatible storage engine ( InnoDB or MyISAM)
Also here is an example of how you can query the full text index and also an additional ID field as hinted in your question:
SELECT *
FROM table
WHERE MATCH (fieldlist) AGAINST ('search text here')
AND ( field2= '1234');
Hi all,
I have this simple table created, called classics in a DB called, publications on XAMMP. I am trying to do a MATCH AGAINST search for an author name which i thought I understood.
Also, I have made sure the table is FULLTEXT indexed, both author and title columns as required. The table is of the type MyISAM also.
I tried this and it failed.
SELECT author FROM classics WHERE MATCH(author) AGAINST('Charles');
I know Charles must be present in the author column and it is as you an see but i get no rows returned.
Now if I rewerite it to any other author, it works
SELECT author FROM classics WHERE MATCH(author) AGAINST ('jane');
Here is what i get with jane...
I'm not sure but it seemed earlier i had to included both fields I'd indexed in the query, instead of just being able to search author alone. Is this correct and does anyone know why I can't get charles returned?.
Many thanks!.
It's not returning those rows because "charles" appears in 50% of the rows. This is a well-documented restriction of MySQL FULLTEXT search.
If you want to get around this restriction, you can use BOOLEAN MODE.
Here's the relevant excerpt from the manual:
A word that matches half of the rows in a table is less likely to locate relevant documents. In fact, it most likely finds plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular data set in which they occur. A given word may reach the 50% threshold in one data set but not another.
The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more. Users who need to bypass the 50% limitation can use the boolean search mode; see Section 12.9.2, “Boolean Full-Text Searches”.
If I store an HTML TEXTAREA in my ODBC database each time the user submits a form, what's the SELECT statement to retrieve 1) all rows which contain a given sub-string 2) all rows which don't (and is the search case sensitive?)
Edit: if LIKE "%SUBSTRING%" is going to be slow, would it be better to get everything & sort it out in PHP?
Well, you can always try WHERE textcolumn LIKE "%SUBSTRING%" - but this is guaranteed to be pretty slow, as your query can't do an index match because you are looking for characters on the left side.
It depends on the field type - a textarea usually won't be saved as VARCHAR, but rather as (a kind of) TEXT field, so you can use the MATCH AGAINST operator.
To get the columns that don't match, simply put a NOT in front of the like: WHERE textcolumn NOT LIKE "%SUBSTRING%".
Whether the search is case-sensitive or not depends on how you stock the data, especially what COLLATION you use. By default, the search will be case-insensitive.
Updated answer to reflect question update:
I say that doing a WHERE field LIKE "%value%" is slower than WHERE field LIKE "value%" if the column field has an index, but this is still considerably faster than getting all values and having your application filter. Both scenario's:
1/ If you do SELECT field FROM table WHERE field LIKE "%value%", MySQL will scan the entire table, and only send the fields containing "value".
2/ If you do SELECT field FROM table and then have your application (in your case PHP) filter only the rows with "value" in it, MySQL will also scan the entire table, but send all the fields to PHP, which then has to do additional work. This is much slower than case #1.
Solution: Please do use the WHERE clause, and use EXPLAIN to see the performance.
Info on MySQL's full text search. This is restricted to MyISAM tables, so may not be suitable if you wantto use a different table type.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Even if WHERE textcolumn LIKE "%SUBSTRING%" is going to be slow, I think it is probably better to let the Database handle it rather than have PHP handle it. If it is possible to restrict searches by some other criteria (date range, user, etc) then you may find the substring search is OK (ish).
If you are searching for whole words, you could pull out all the individual words into a separate table and use that to restrict the substring search. (So when searching for "my search string" you look for the the longest word "search" only do the substring search on records containing the word "search")
I simply use SELECT ColumnName1, ColumnName2,.....WHERE LOCATE(subtr, ColumnNameX)<>0
To get rows with ColumnNameX having the substring.
Replace <> with = to get rows NOT having the substring.