MySQL- Output an "other" column based on multiple columns - mysql

I am using MySQL REGEXP to assign reviews into different topics and output them into separate columns. The problem is- some reviews may not get assigned to any topic, which is why I need an "Other" column. How do I modify the query below to achieve that?
SELECT
text,
text REGEXP 'keywords' AND text REGEXP 'other keywords' AND .... AS Cleanliness,
text REGEXP 'keywords' AND text REGEXP 'other keywords' AND .... AS Restaurant,
text REGEXP 'keywords' AND text REGEXP 'other keywords' AND .... AS Wifi,
FROM review_table;
Note that a review can belong to multiple topics.
The end result should look like this:

One solution would be create anoter REGEXP expression that represents the negation of all other expressions. But that can quickly become tedious to maintain.
Another option is to just wrap the query and analyze the results in the outer query to generate the additional column. This should be as simple as:
SELECT x.*, (Cleanliness + Food + Wifi = 0) AS Other
FROM (
--- original query
) x
Tip: in MySQL, the return value of a condition expression is 1 on success and 0 on failure. This means that this expression:
CASE
WHEN review REGEXP 'relevant keywords'
AND review REGEXP 'additional keywords if necessary'
THEN 1
ELSE 0
END AS 'Cleanliness'
Can also be written:
(
review REGEXP 'relevant keywords'
AND review REGEXP 'additional keywords if necessary'
) AS 'Cleanliness'

I think we can use the NOT(expression) command
CASE
WHEN review NOT (REGEXP 'relevant keywords'
AND review REGEXP 'additional keywords if necessary' )
THEN 1
ELSE 0
END AS 'Irrelevant'
Reference: https://dev.mysql.com/doc/refman/5.7/en/regexp.html
Related: negate regex pattern in mysql

Related

How to make the most relevant search with MySQL?

I want to do a search based on its relevance to MySQL. Whatever I tried, I could not succeed. I did a Fulltext search, I did a search with Like, again I can not do the following order.
Search word for example this: "first second"
First Second Keyword -> (full word) will begin with
Keyword First Second Keyword -> (full word) in between or at the end
Firstsecond Keyword -> (adjacent word) will begin with
Keyword Firstsecond Keyword -> (adjacent word) in between or at the end
First Keyword -> (only the first word) will begin with
Keyword First Keyword -> (only the first word) in between or at the end
Second Keyword -> (only the second word) will begin with
Keyword Second Keyword -> (only the second word) in between or at the end
Then I want it to continue as below; (all of them)
Firstasdasd Keyword
Keyword asdasdfirst Keyword
Keyword asdasdfirstasdasd Keyword
Secondasdasd Keyword
Keyword asdasdsecond Keyword
Keyword asdasdsecondasdasd Keyword
Yeah, FULLTEXT won't do this kind of search. It's word-oriented.
What you want isn't really simple SQL. I think you want something like this to hit the first keyword.
SET #match := 'first'
SELECT MIN(priority) priority, value
FROM (
SELECT 1 priority, value FROM tbl WHERE value LIKE CONCAT(#match, ' %')
UNION ALL
SELECT 2 priority, value FROM tbl WHERE value LIKE CONCAT('% ', #match, ' %')
UNION ALL
SELECT 3 priority, value FROM tbl WHERE value LIKE CONCAT('%', #match, '%')
) results
GROUP BY value
ORDER BY MIN(priority), value;
Generally, you use a UNIONed series of SELECTs with priorities for the full-word and embedded word matches, and take the lowest priority match you get for each row.
You'll need to elaborate on that general kind of pattern to handle both keywords. This looks like it could turn into a real pain in the xxx neck to get right. And the LIKE '%something' search terms mean it won't be very fast.
If you're going to large scale with this, it might be worth investigating Sphinx.

SQL regex works the wrong way

I'm trying to build a SQL statement to retrieve user names in the following order
at first, return the names that start with Arabic letter, then the names that start with English letters, then the names that start with special characters.
then sort each of the three groups in ascending order.
This is my code:
SELECT `name` FROM `user`
order by case when substring(name,1,1) like 'N[أ-ي]' then 1
when substring(name,1,1) like '[a-zA-Z]' then 2
else 3
end
,name
The problem is that the case part always returns 3, and so the statement sorts the names in the default order(special chars first, then English letters then Arabic letters). What is the problem in my query?
You need to use regex, not like... (because you use regular expression)
SELECT `name` FROM `user`
order by case when substring(name,1,1) regexp 'N[أ-ي]' then 1
when substring(name,1,1) regexp '[a-zA-Z]' then 2
else 3
end
,name
Reference: MySQL CASE statement and REGEXP

MySQL - need to find records without a period in them

I've been to the regexp page on the MySQL website and am having trouble getting the query right. I have a list of links and I want to find invalid links that do not contain a period. Here's my code that doesn't work:
select * from `links` where (url REGEXP '[^\\.]')
It's returning all rows in the entire database. I just want it to show me the rows where 'url' doesn't contain a period. Thanks for your help!
SELECT c1 FROM t1 WHERE c1 NOT LIKE '%.%'
Your regexp matches anything that contains a character that isn't a period. So if it contains foo.bar, the regexp matches the f and succeeds. You can do:
WHERE url REGEXP '^[^.]*$'
The anchors and repetition operator make this check that every character is not a period. Or you can do:
WHERE LOCATE(url, '.') = 0
BTW, you don't need to escape . when it's inside [] in a regexp.
Using regexp seems like an overkill here. A simple like operator would do the trick:
SELECT * FROM `links` WHERE url NOT LIKE '%.%
EDIT:
Having said that, if you really want to negate regexp, just use not regexp:
SELECT * FROM `links` WHERE url NOT REGEXP '[\\.]';

Regexp inside of 'where' method

I need a method that will go through database and return appropriate results. in this case its searching for books by author, title, publishing date or ISBN code. I decided to use where() method but i encountered two problems:
1) i have trouble searching by multiple fields. its easy looking for a title:
def self.browse(query)
if query.nil?
nil
else
self.where("title REGEXP :query", query: query)
end
end
but i dont know how to set it to look for title OR author OR isbn etc. tried
self.where("(title OR author OR publishing_date OR isbn) REGEXP :query", query: query)
but it doesnt work
and second, i want my query to match only a beginning or the end of the word. in mysql Workbench its pretty easy but i have a hard time doing it in Rails. here's what i've tried so far (and failed):
self.where("title REGEXP :query", query: /^(query)*$/)
self.where("title REGEXP /^:query/", query: query)
self.where("title REGEXP :query", query: $"query"^)
Needless to say, on the internet i found many different docs or tutorials, one saying "^" should be at the end, the other it should be at the beginning...
1) You will want to use parentheses and both AND and OR clauses in your where sql:
(title IS NOT NULL AND title REGEXP :id_query) OR (name IS NOT NULL AND name REGEXP :name_query)
2) You will want to use both ^ (beginning of line) and $ (end of line), like this.
(^something|something$)
Here is an example of the whole thing that I matched against my own code. Replace id and name with your own columns, and put extra OR's in there to match against more columns
Charity.where("(id IS NOT NULL AND id REGEXP :id_query) OR (name IS NOT NULL AND name REGEXP :name_query)", id_query:'1', name_query:'(^a|a$)')
Here is the to_sql output of the above:
Charity.where("(id IS NOT NULL AND id REGEXP :id_query) OR (name IS NOT NULL AND name REGEXP :name_query)", id_query:'1', name_query:'(^a|a$)').to_sql
=> "SELECT `charities`.* FROM `charities` WHERE ((id IS NOT NULL AND id REGEXP '1') OR (name IS NOT NULL AND name REGEXP '(^a|a$)'))"
This should do it:
self.where("title REGEXP ? OR author REGEXP ? OR publishing_date REGEXP ? OR isbn REGEXP ?", query, query, query, query)
The "?" will be subbed in order by the included variables. If you want to use the same regexp for each column, then just plug the code in as-is
As for the second part, you may want to check out the LIKE operator.
To match a column which starts with a given string you'd do:
self.where("title LIKE ?", (query + "%"))
And to match a column that ends in a particular string:
self.where("title LIKE ?", ("%" + query))
create your sql query and pass into ActiveRecord execute method,it will excute sql query and do not need to change in ActiveRecord query
sql query = "your sql query"
ActiveRecord::Base.connection.execute(sql query)
You can use or:
class MyARModel < ActiveRecord::BAse
scope :search, ->(rgx) do
where('title REGEXP ?', rgx)
.or('author REGEXP ?' rgx)
.or('publishing_date REGEXP ?' rgx)
.or('isbn REGEXP ?' rgx)
end
#...

MySQL REGEXP: matching blank entries

I have this SQL condition that is supposed to retrieve all rows that satisfy the given regexp condition:
country REGEXP ('^(USA|Italy|France)$')
However, I need to add a pattern for retrieving all blank country values. Currently I am using this condition
country REGEXP ('^(USA|Italy|France)$') OR country = ""
How can achieve the same effect without having to include the OR clause?
Thanks,
Erwin
This should work:
country REGEXP ('^(USA|Italy|France|)$')
However from a performance point of view, you may want to use the IN syntax
country IN ('USA','Italy','France', '')
The later should be faster as REGEXP can be quite slow.
There's no reason you can't use the $ (match end of string) to fill in your "empty subexpression" issue...
It looks a little weird but country REGEXP ('^(USA|Italy|France|$)$') will actually work
You could try:
country REGEXP ('^(USA|Italy|France|)$')
I just added another | after France, which should would basically tell it to also match ^$ which is the same as country = ''.
Update: since this method doesn't work, I would recommend you use this regex:
country REGEXP ('^(USA|Italy|France)$|^$')
Note that you can't use the regex: ^(USA|Italy|France|.{0})$ because it will complain that there is an empty sub expression. Although ^(USA|Italy|France)$|^.{0}$ would work.
Here are some examples of the return value of this regex:
select '' regexp '^(USA|Italy|France)$|^$'
> 1
select 'abc' regexp '^(USA|Italy|France)$|^$'
> 0
select 'France' regexp '^(USA|Italy|France)$|^$'
> 1
select ' ' regexp '^(USA|Italy|France)$|^$'
> 0
As you can see, it returns exactly what you want.
If you want to treat blank values the same (e.g. 0 spaces and 5 spaces both count as blank), you should use the regex:
country REGEXP ('^(USA|Italy|France|\s*)$')
This will cause the last row in the previous example to behave differently, i.e.:
select ' ' regexp '^(USA|Italy|France|\s*)$'
> 1