Ordering mysql result by number of regexp matches - mysql

I've the following query. It selects all posts where the title contains the words green, blue or red.
SELECT id, title FROM post WHERE title REGEXP '(green|blue|red)'
I would like to sort the results in such a way that the title with the most matches (all three words) and thus the most relevant one, is listed first. Is this possible in this scenario and if so, how I would go on about it?
Thanks

You must split the regex. Either to different conditions or different queries:
SELECT COUNT(results.username) as count, results.* FROM (
SELECT * FROM `post` WHERE `title` LIKE "%blue%"
UNION SELECT * FROM `post` WHERE `title` LIKE "%red%"
UNION SELECT * FROM `post` WHERE `title` LIKE "%green%"
) as results GROUP BY results.title ORDER BY count DESC;
Note: I used LIKE instead of REGEXP, becouse when you split the condition you wont need it anymore according to your example. LIKE is a bit faster then regex, but if your pattern is more complex, then you can always replace it back.

Related

MySQL Fulltext Search: One-to-many Relationships

I'm attempting to implement a search function on a two tables with a one-to-many relationship. Think of it as a post with multiple tags. Each tag has its own row in the tag table.
I'd like to retrieve a post if all of the search terms can be found in either a) the post text, b) the post tags or c) both.
Let's say I've created my tables like this:
CREATE TABLE post (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
text VARCHAR(100) NOT NULL
);
CREATE TABLE tag (
id MEDIUMINT NOT NULL AUTO_INCREMENT,
name VARCHAR(30) NOT NULL,
post MEDIUMINT NOT NULL
);
And I create indexes like this:
CREATE FULLTEXT INDEX post_idx ON post(text);
CREATE FULLTEXT INDEX tag_idx ON tag(name);
If my search query were "TermA TermB" and wanted to search just in the post text, I'd formulate my SQL query like this:
SELECT * FROM post WHERE MATCH(text) AGAINST('+TermA +TermB' IN BOOLEAN MODE);
Is there a way to add tags into the mix? My previous attempt was this:
SELECT * FROM post
RIGHT JOIN tag ON tag.post = post.id
WHERE MATCH(post.text) AGAINST('TermA TermB' IN BOOLEAN MODE)
OR MATCH(tag.name) AGAINST('TermA TermB' IN BOOLEAN MODE);
The problem is, this is only an any words query and not an all words query. By this I mean, I'd like to retrieve the post if TermA is in the text and TermB is in the tags.
What am I missing here? Is this even possible using a fulltext search? Is there a better way to approach this?
Try this one:
SELECT post.*
FROM post
INNER JOIN (SELECT post, GROUP_CONCAT(name SEPARATOR ' ') tags FROM tag GROUP BY post) tag ON post.id=tag.post
WHERE MATCH(post.text) AGAINST('+TermA +TermB' IN BOOLEAN MODE)
OR MATCH(tags) AGAINST('+TermA +TermB' IN BOOLEAN MODE)
This might work to also get results that match from either content or tags, but it didn't work in the MySQL 5.1:
SELECT post.*, GROUP_CONCAT(tag.name SEPARATOR ' ') tags
FROM post
LEFT JOIN tag ON post.id=tag.post
GROUP BY post.id
HAVING MATCH(post.text,tags) AGAINST('+TermA +TermB' IN BOOLEAN MODE)
so I rewrote it as:
SELECT post.*, tags
FROM post
LEFT JOIN (SELECT post, GROUP_CONCAT(tag.name SEPARATOR ' ') tags FROM tag GROUP BY post) tags ON post.id=tags.post
WHERE MATCH(post.text, tags) AGAINST('+TermA +TermB' IN BOOLEAN MODE)
This is possible, but I'm guessing that in your Tags table, you have one row for each tag per post. So one row containing the tag 'TermA' for post 1 and another record with the tag 'TermB', right?
The all words query (with +) only returns rows where the searched field contains all the specified words. For the tags table, that is never the case.
One possible solution would be to store all tags in a single field in the posts table itself. Then it would be easy to do advanced matching on the tags as well.
Another possibility is to change the condition for tags altogether. That is, use an all query for the text and an any query for the tags. To do that, you'll have to modify the search query yourself, which can fortunately be as easy as removing the plusses from the query.
You can also query for an exact match, like this:
SELECT * FROM post p
WHERE
MATCH(p.text) AGAINST('TermA TermB' IN BOOLEAN MODE)
AND
/* Number of matching tags .. */
(SELECT COUNT(*) FROM tags t
WHERE
t.post = p.id
AND (t.tag in ('TermA', 'TermB')
= /* .. must be .. */
2 /* .. number of searched tags */ )
In this query, I count the number of matching tags. In this case I want it to be exactly 2, meaning that both tags match (provided that tags are unique per post). You could also check for >= 1 to see if any tags match.
But as you can see, this also requires parsing of the search string. You will have to remove the plusses (or even check their existence to understand whether you want 'any' or 'all'). And you will have to split it as well to get the number of searched words, and get the separate words themselves.
All in all, adding all tags to a 'tags' field in post is the easiest way. Not ideal from a normalisation point of view, but that is managable, I think.
You can search on both text and tags.
SELECT *
FROM post
WHERE MATCH(text,tags) AGAINST('+TermA +TermB' IN BOOLEAN MODE)
To get this to work you'll need to make a FULLTEXT index for both columns together.
CREATE FULLTEXT INDEX keywords ON pos(text,tags)
In Boolean search mode this should do what you want.

LIKE condition : why '%,x,' doesn't work but '%,x,%' works

i have table called image
there is column called category in that table which varchar and stores all the categories of an image sperated with comma .
i one row i have :
category : ,26,25,
this query works fine
SELECT *
FROM `images`
WHERE `confirm` =1
AND `category` LIKE '%,25,%' AND `category` LIKE '%,26,%'
LIMIT 0 , 20
and i get all the rows with ,25,26, as their category
but
why this doesn't work ?
SELECT *
FROM `images`
WHERE `confirm` =1
AND `category` LIKE '%,25,' AND `category` LIKE '%,26,'
LIMIT 0 , 20
LIKE matches the entire string. LIKE '%,26,' matches strings that end in ,26,, not strings that contain it. You need % on both ends if you want to search for a substring anywhere.
LIKE must match the entire value in the table. If there is content in the table after the "25,", then LIKE '%,25,' will not match it.
If you want regular expression matching in mysql, you can use RLIKE:
AND category RLIKE ',25,' AND category RLIKE ',26,'
but if you use LIKE, you have to match the whole thing.
The like is an exact match, not a match for something arbitrarily inside the string. You need wildcards to say that you want a match somewhere inside.
The expressions:
AND `category` LIKE '%,25,' AND `category` LIKE '%,26,'
Are looking for cases when category ends in ',25,' AND ends in ',26,' at the same time. Clearly, this is not possible.
You can also phrase this in MySQL as:
AND find_in_set(25, category) > 0 and find_in_set(26, category) > 0
Also, you should have a separate table that has one row per category. Such queries would be much easier and more efficient with a proper relational data structure.

Ordering a Union Query in MS Access SQL

OK I have a particularly nasty union ordering problem so any help would be appreciated.
The scenario is this:
Member Table with the following records (actual data):
REI882
YUI987
POBO37
NUBS26
BTBU12
MZBY10
TYBW54
(These are listed in the order I want them back from my query.)
There are a number of business rules about the construction of these MemberIDs which I believe are unrelated to the sort. They're historic and set in stone. I'm stuck with them. They indicate seniority of the member.
The ordering is done from the last 4 characters in the ID, ascending. The first two characters of the ID are completely meaningless as far as the sort is concerned.
So the topmost possible record is ??A001 (most senior) and the lowest possible record is ??ZZ99 (least senior).
When I query my member table the list I get back must display most senior at top... Obviously a standard sort does not work. This is what I have to date:
The first of these queries deals with sorting members whose ID only has 1 leading letter. The second deals with those with 2 leading letters.
SELECT * FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1)) **check the 4th character is a digit
ORDER BY (Mid([Member.ID],3,1)), (Mid([Member.ID],4,1)), (Mid([Member.ID],5,1)), (Mid([Member.ID],6,1))
) t1
UNION
SELECT * FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=0)) **check the 4th character is a letter
ORDER BY (Mid([Member.ID],3,1)), (Mid([Member.ID],4,1)), (Mid([Member.ID],5,1)), (Mid([Member.ID],6,1))
) t2
But I get CRAZY results with the union! If I run each of the selects individually - no problem my funky (heavily reliant on some nasty string manipulation in access!) sort works exactly as I want it.
I understand this is pretty complicated but I hope I've explained it clearly and that someone is up for some kudos for figuring it out!!!
edit: The result from my query is seemingly random:
YUI987
MZBY10
NUBS26
BTBU12
REI882
POBO37
TYBW54
ORDER BY in a SELECT statement that UNION with another SELECT is not correct.
See Specifying a conditional order here
You can use this:
SELECT ID FROM(
(SELECT Member.ID,1 AS T,Left([Member.ID],2) AS Part1, Right([Member.ID],4) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],3,1)))=-1)))
UNION
(SELECT Member.ID,2 AS T,Left([Member.ID],3) AS Part1, Right([Member.ID],3) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1) and ((IsNumeric(Mid([Member.ID],3,1)))=0)))
UNION
(SELECT Member.ID,3 AS T,Left([Member.ID],4) AS Part1, Right([Member.ID],2) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],5,1)))=-1) and ((IsNumeric(Mid([Member.ID],4,1)))=0)))
ORDER BY T,Part1,Part2)
#Justin Kirk: I don't know what is your problem exactly. But I hope it can help you
Why are you not using the RIGHT function.
Something like
SELECT ID
FROM (
SELECT ID
FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1)) **check the 4th character is a digit
) t1
UNION
SELECT ID
FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=0)) **check the 4th character is a letter
) t2
) t3
ORDER BY RIGHT(ID,4)
How about skipping the UNION?
SELECT members.ID
FROM members
ORDER BY Right([ID],3), Right(id,4)
Based on the new rules, this mess may work.
SELECT
Len(IIf([textId] Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left([textid],2),
IIf([textId] Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left([textid],3),
IIf([textId] Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left([textid],4),"_")))) AS Ln,
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left(textid,2),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left(textid,3),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left(textid,4),"_"))) AS Alpha,
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Val(Right(textid,4)),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Val(Right(textid,3)),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Val(Right(textid,2)),0))) AS Numbr,
table.textid
FROM table
ORDER BY
Len(IIf([textId] Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left([textid],2),
IIf([textId] Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left([textid],3),
IIf([textId] Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left([textid],4),"_")))),
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left(textid,2),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left(textid,3),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left(textid,4),"_"))),
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Val(Right(textid,4)),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Val(Right(textid,3)),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Val(Right(textid,2)),0)))

SQL search result ranking

I have a table call objects which there are the columns:
object_id,
name_english(vchar),
name_japanese(vchar),
name_french(vchar),
object_description
for each object.
When a user perform a search, they may enter either english, japanese or french... and my sql statement is:
SELECT
o.object_id,
o.name_english,
o.name_japanese,
o.name_french,
o.object_description
FROM
objects AS o
WHERE
o.name_english LIKE CONCAT('%',:search,'%') OR
o.name_japanese LIKE CONCAT('%',:search,'%') OR
o.name_french LIKE CONCAT('%',:search,'%')
ORDER BY
o.name_english, o.name_japanese, o.name_french ASC
And some of the entries are like:
Tin spoon,
Tin Foil,
Doctor Martin Shoes,
Martini glass,
Cutting board,
Ting Soda.
So, when the user search the word "Tin" it will return all results of these, but instead I just want to return the results which specific include the term "Tin" or displaying the result and rank them by relevance order. How can I achieve that?
Thanks.
You can use MySQL FULLTEXT indices to do that. This requires the MyISAM table type, an index on (name_english, name_japanese, name_french, object_description) or whatever fields you want to search on, and the appropriate use of the MATCH ... AGAINST operator on exactly that set of columns.
See the manual at http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html, and the examples on the following page http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
After running the query above , you will get all sort of results including ones that you are not interested, but you can then use regular expressions on the above results(returned by mysql server) set to filter out what u need.
This should do the trick - you may have to filter out duplicates, but the basic idea is obvious.
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 1 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT(:search,'%') OR `object`.`name_japanese` LIKE CONCAT(:search,'%') OR `object`.`name_french` LIKE CONCAT(:search,'%')
union
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 10 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT('%',:search,'%') OR `object`.`name_japanese` LIKE CONCAT('%',:search,'%') OR `object`.`name_french` LIKE CONCAT('%',:search,'%')
ORDER BY ranking, `object`.`name_english`, `object`.`name_japanese`, `object`.`name_french` ASC

MySQL query results used in LIKE

I am trying to write a query to pull all the rows that contain a username from a large list of usernames in a field.
For example, the table contains a column called 'Worklog' which contains comments made by users and their username. I need to search that field for all user names that are contained in a list I have.
I have tried a few different things but can't get anything to work. So far, this is kind of what I have tried:
SELECT *
FROM `JULY2010`
WHERE `WorkLog`
IN (
SELECT CONCAT( '%', `UserName` , '%' )
FROM `OpsAnalyst`
)
The problem is I need to use LIKE because it is searching a large amount of text, but I also have a large list that it is pulling from, and that list needs to be dynamic because the people that work here are changing frequently. Any ideas?
SELECT *
FROM `JULY2010`
WHERE `WorkLog` REGEXP
(SELECT CONCAT( `UserName`, '|')
FROM `OpsAnalyst`)
I slightly modified this and used GROUP_CONCAT() and now my query looks like this:
SELECT *
FROM JULY2010
WHERE `WorkLog`
REGEXP (
SELECT GROUP_CONCAT(`UserName` SEPARATOR '|') FROM `OpsAnalyst`
)
I am now getting a result set, but it seems like it isn't as many results as I should be getting. I'm going to have to look into it a little more to figure out what the problem is