I have table 'key' with rows
happy new year
I love NY
I have table 'content' with rows
I want to say you: happy new year, Mike
I saw that banner with I love NY really
I would like to find in table 'content' words from table 'key' and replace it with hrefs. The table 'content' will be like
I want to say you: happy new year, Mike
I saw that banner with I love NY really
Is it possible to make it using mysql syntax?
You can get pretty close with this:
update content c join
keys k
on c.col like concat('%', k.col, '%')
set c.col = replace(c.col, k.col,
concat('<a href="', replace(k.col, ' ', '-'), '">',
k.col, '</a>')
);
The way that update works with multiple matches is that only one of the matches takes effect. So, this will only replace one key value. But, it will do it throughout the entire string. In other words, if the same key appears multiple times, then it will be replaced each time.
Related
First off there seems to be no way to get an exact match using a full-text search. This seems to be a highly discussed issue when using the full-text search method and there are lots of different solutions to achieve the desired result, however most seem very inefficient. Being I'm forced to use full-text search due to the volume of my database I recently had to implement one of these solutions to get more accurate results.
I could not use the ranking results from the full-text search because of how it works. For instance if you searched for a movie called Toy Story and there was also a movie called The Story Behind Toy Story that would come up instead of the exact match because it found the word Story twice and Toy.
I do track my own rankings which I call "Popularity" each time a user access a record the number goes up. I use this datapoint to weight my results to help determine what the user might be looking for.
I also have the issue where sometimes need to fall back to a LIKE search and not return an exact match. I.e. searching Goonies should return The Goonies (most popular result)
So here is an example of my current stored procedure for achieving this:
DECLARE #Title varchar(255)
SET #Title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE #Title2 varchar(255)
SET #Title2 = REPLACE(#title, '"', '')
--get top 100 results using full-text search and sort them by popularity
SELECT TOP(100) id, title, popularity As Weight into #TempTable FROM movies WHERE CONTAINS(title, #Title) ORDER BY [Weight] DESC
--check if exact match can be found
IF EXISTS(select * from #TempTable where Title = #title2)
--return exact match
SELECT TOP(1) * from #TempTable where Title = #title2
ELSE
--no exact match found, try using like with wildcards
SELECT TOP(1) * from #TempTable where Title like '%' + #title2 + '%'
DROP TABLE #TEMPTABLE
This stored procedure is executed about 5,000 times a minute, and crazy enough it's not bringing my server to it's knees. But I really want to know if there was a more efficient approach to this? Thanks.
You should use full text search CONTAINSTABLE to find the top 100 (possibly 200) candidate results and then order the results you found using your own criteria.
It sounds like you'd like to ORDER BY
exact match of the phrase (=)
the fully matched phrase (LIKE)
higher value for the Popularity column
the Rank from the CONTAINSTABLE
But you can toy around with the exact order you prefer.
In SQL that looks something like:
DECLARE #title varchar(255)
SET #title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE #title2 varchar(255)
SET #title2 = REPLACE(#title, '"', '')
SELECT
m.ID,
m.title,
m.Popularity,
k.Rank
FROM Movies m
INNER JOIN CONTAINSTABLE(Movies, title, #title, 100) as [k]
ON m.ID = k.[Key]
ORDER BY
CASE WHEN m.title = #title2 THEN 0 ELSE 1 END,
CASE WHEN m.title LIKE #title2 THEN 0 ELSE 1 END,
m.popularity desc,
k.rank
See SQLFiddle
This will give you the movies that contain the exact phrase "Toy Story", ordered by their popularity.
SELECT
m.[ID],
m.[Popularity],
k.[Rank]
FROM [dbo].[Movies] m
INNER JOIN CONTAINSTABLE([dbo].[Movies], [Title], N'"Toy Story"') as [k]
ON m.[ID] = k.[Key]
ORDER BY m.[Popularity]
Note the above would also give you "The Goonies Return" if you searched "The Goonies".
If got the feeling you don't really like the fuzzy part of the full text search but you do like the performance part.
Maybe is this a path: if you insist on getting the EXACT match before a weighted match you could try to hash the value. For example 'Toy Story' -> bring to lowercase -> toy story -> Hash into 4de2gs5sa (with whatever hash you like) and perform a search on the hash.
In Oracle I've used UTL_MATCH for similar purposes. (http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/u_match.htm)
Even though using the Jaro Winkler algorithm, for instance, might take awhile if you compare the title column from table 1 and table 2, you can improve performance if you partially join the 2 tables. I have in some cases compared person names on table 1 with table 2 using Jaro Winkler, but limited results not just above a certain Jaro Winkler threshold, but also to names between the 2 tables where the first letter is the same. For instance I would compare Albert with Aden, Alfonzo, and Alberto, using Jaro Winkler, but not Albert and Frank (limiting the number of situations where the algorithm needs to be used).
Jaro Winkler may actually be suitable for movie titles as well. Although you are using SQL server (can't use the utl_match package) it looks like there is a free library called "SimMetrics" which has the Jaro Winkler algorithm among other string comparison metrics. You can find detail on that and instructions here: http://anastasiosyal.com/POST/2009/01/11/18.ASPX?#simmetrics
Ok, so here is the issue.
I have a table with some columns and 'subject' is one of the columns.
I need to get the first 10 letters from the 'subject' field no matter the 'subject' field contains a string with 100 letters.
For example,
Table - tbl.
Columns - id, subject, value.
SQL Query:
SELECT subject FROM tbl WHERE id ='$id';
The result I am getting is, for example
Hello, this is my subject and how are you
I only require the first 10 characters
Hello, thi
I can understand that I can remove the rest of the characters using php substr() but that's not possible in my case. I need to get the excess characters removed by MySQL. How can this be done?
Using the below line
SELECT LEFT(subject , 10) FROM tbl
MySQL Doc.
SELECT SUBSTRING(subject, 1, 10) FROM tbl
Have a look at either Left or Substring if you need to chop it up even more.
Google and the MySQL docs are a good place to start - you'll usually not get such a warm response if you've not even tried to help yourself before asking a question.
I want to search using MATCH in mysql.
I have 1 table contain "name" and "category" fields. the "category" field contain book,books,books.
what i want is, when i search "book" or "books" in category field, it should give me 3 row.
can anyone help me with this ?
thanks
i need to clarified this question, actually i have a website which have search field. when user input something on it, my web should search in category field. the real problem is, sometimes user input "book", sometimes "books", sometime "car" ,sometimes "cars". these "s" word after the word make me headache, i know that user really want is to find all related with book or car, so, what should i do, should i strip every "s" letter ? or is there any better solution ?
Ari
select *
from table
where category LIKE '%book%'
trim the user input to a acceptable length and try this query
$userInput = substr($input, 0, 4);
select * from table where category like "%$userInput%"
If you are running the query from PHP, for example, you could prepare the query there and then use a simple regular expression:
<?php
$term = 'book';
if(substr($term,-1) == 's') { //if term ends in an s
$term = substr($term,0,-1); //the word without the s
}
//TODO: escape $term to prevent SQL injection
$query = "
SELECT * FROM table
WHERE category REGEXP '{$term}s?' // s? matches zero or one 's' character
";
Searching with MATCH() requires a fulltext index on column category, which might be overkill.
If you really just want those two cases, you could write
select * from table where
category = 'book' or category = 'books'
With Oddant's answer you might also get results like 'probookcover' or whatever.
If you want it to be case insensitive you have multiple options.
select * from table where
lower(category) = 'book' or lower(category) = 'books'
or
select * from table where
category like 'book' or category like 'books'
Alternatively you could also do
select * from table where
category like 'book%'
which gets you all columns which start with book, but you might also get 'bookcover'.
EDIT: Considering your comment:
Like I said, match() is overkill, therefore I would do it like this:
select * from table where
category = whatYourUserEnters OR category = substring(whatYourUserEnters, 1, length(whatYourUserEnters) - 1)
I have a table that has a book title field. I would like to be able to sort the records like this:
The Ancient Alligator
Aunt Annie's Alligator
A Complete Guide to Alligators
Countrified Alligators
Don't Touch the Alligator!
An Effortless Alligator Hunt
and so on, ignoring "A", "An", & "The" when they appear as the first word of the title. (They could also be ignored anywhere in the title.)
I know these are stopwords in SQL Server 2008, so they can be ignored if someone uses them in a search.
But is there a way to make them ignored by ORDER BY? (If it makes a difference, the query will use a LinqDataSource in ASP.NET.)
Thanks!
Computing a sort key by using replace() won't scale if you have a large number of records.
The best way is to add an additional table field containing the title with A/An/The etc prefixes removed and make sure it has an index to speed up sorting. Then you can just order by this new field but display the original unchanged field.
Something like this perhaps.
;with T(Title) as
(
select 'The Ancient Alligator' union all
select 'Aunt Annie''s Alligator' union all
select 'A Complete Guide to Alligators' union all
select 'Countrified Alligators' union all
select 'Don''t Touch the Alligator!' union all
select 'An Effortless Alligator Hunt'
)
select Title
from T
order by replace(
replace(
replace(T.Title,
'A ', ''),
'An ', ''),
'The ', '')
Result:
Title
------------------------------
The Ancient Alligator
Aunt Annie's Alligator
A Complete Guide to Alligators
Countrified Alligators
Don't Touch the Alligator!
An Effortless Alligator Hunt
I have a list of movies that I have grouped by letter. Naturally, the movies starting with the letter "T" have about 80% of movies that begin with "The". Movies such as "The Dark Knight" should appear in the "D" list, and preferably in the "T" as well. Any way I can do that?
I use the following code in the WHERE clause to display movies that start with a certain letter, ignoring "the", but this also had a convenient side effect of having a movie such as "The Dark Knight" appear for letter "D" and "T".
WHERE movie_title REGEXP CONCAT('^(the )?', '$letter')
I would like to achieve this when I echo out all the movies that are in the database.
If you are going to be performing this query frequently, you will want to create a separate field in the table with the 'sorted' name. Using regular expressions or other operations make it impossible for MySQL to take advantage of the index.
So, the simplest and most efficient solution is to make your add a movie_title_short field, which contains movie_title without the "The" or "A". Be sure to add an index to the movie_title_short field too!
As Carl said, I'd build this into its own indexable field to avoid having to compute it each time. I'd recommend doing it in a slightly different way to avoid redundancy though.
movies (id, name, namePrefix)
eg:
| Dark Knight | The |
| Affair To Remember | An |
| Beautiful Mind | A |
This way you can show these movies in two different ways: "name, namePrefix" or "namePrefix name" and can be sorted accordingly.
select right(movie_title, char_length(movie_title)-4) as movie_title
from movies
where left(movie_title,3) = 'the'
union
select movie_title
from movies
You can use the mysql replace function in the select clause...
select replace(movie_title,'The ','') from ... order by replace(movie_title,'The ','')'
Just had that problem myself... solution is:
SELECT * FROM movies WHERE title REGEXP '^d' AND title NOT REGEXP '^the ' OR title REGEXP '^the d'
this will give you only results that starts with "The D" or "D"
Use this:
SELECT * FROM movies ORDER BY TRIM(LEADING 'the ' FROM LOWER(`movie_title`));