This is my SQL Query:
SELECT filename
FROM video
WHERE MATCH (title, description) AGAINST
('sports' IN BOOLEAN MODE);
I'm searching the title and description fields for the word sports (case insensitive)
Now I want to count the number of times that the word score appeared on those fields, but independently... So I want to get something like this: n_title=2, n_description=1.
I already have tried this query, and it works...
SELECT filename,
ROUND ((LENGTH(description) - LENGTH( REPLACE ( description, "sports", ""))) / LENGTH("sports")) AS count
FROM video
but it's not case insensitive, and when I type sports it doesn't came up with any results, because on the database it's "saved" as "Sports" (with the uppercase "S").
Now my problem is how can I "concatenate" this 2 queries, and use them as one. So I can search for any word case insensitive, and also count the number of occurrences from each field.
This is what you are looking for (fiddle):
SELECT
filename,
(
CHAR_LENGTH(title)
- CHAR_LENGTH( REPLACE(LOWER(title), "sports", "") )
) / CHAR_LENGTH("sports") AS cnt_title,
(
CHAR_LENGTH(description)
- CHAR_LENGTH( REPLACE(LOWER(description), "sports", ""))
) / CHAR_LENGTH("sports") AS cnt_desc
FROM video
WHERE MATCH (title, description) AGAINST ('sports' IN BOOLEAN MODE);
REPLACE is case-sensitive by definition. The trick is to handle a lowercased version of the string. Also, your should use CHAR_LENGTHinstead of LENGTH. The former counts characters whereas the latter counts bytes (and you are using UTF8).
Related
I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.
The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.
The second select statement is more rudimentary, but I am unsure which operators to use.
Which would be the quickest solution?
SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName
SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%'
-- Will take a substring of this result as well
Example Data:
randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc
In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.
EDIT
SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable
This expression is still not grabbing the right data, but I feel like it may get me closer.
I suggest using
REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')
Details:
(?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
[a-zA-Z]{3} - three ASCII letters
[0-9]{3,4} - three to four ASCII digits
(?![^&#]) - that are followed either with &, # or end of string.
See the online demo:
WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte
Output:
I'd make use of capture groups:
(?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&])
I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=.
I also added a lookahead (?= for "&".
If you'd like to fidget more with regex I recommend RegExr
I have a table which contains two fields. The first is name of type string. The second contains one or more strings separated by comma (but it can contain a single string with no commas at all)
I want to construct a query to know if the string in the name field does not exist in every comma separated strings in the names field.
Example 1:
---------------------------------------------------------
name names
---------------------------------------------------------
myname xmyname,myname,mynamey
All the comma separated strings contain the word myname. So the query shoudl not return this row.
But, Example 2:
---------------------------------------------------------
name names
---------------------------------------------------------
myname x,myname,mynamey
Should be returned. Because x does not contain myname.
The condition is that, if the string in the field name does not exists in each of the comma separated strings in the names field, then return the row.
This is not correct as this query will not return true in example 2 (which contains x which does not contain myname).
IMPORTANT NOTE:
1) There is not limit of how many commas there. It can be 0 commas or more. How to deal with this?
2) The strings are variables. It is not always the case that the string is myname. Each row contains a different string in the name field.
Try this regular expression:
where not concat(names, ',') regexp replace('^([^,]*{n}[^,]*,)*$', '{n}', name)
db-fiddle demo
How to read the pattern:
The inner pattern [^,]*{n}[^,]*, means
Any non comma character [^,] repeated any number of times (* means no times or multiple times).
followed by the value of the column name ({n} is a placeholder and will be replaced with the actual value using the replace() function)
followed by any non comma character [^,] repeated any number of times
followed by a comma
The outer pattern ^({inner_pattern})*$ means
Start of the string (^)
followed by the inner pattern repeated any number of times
followed by end of string ($)
To make this work, a comma is appended to the names column (concat(names, ',')), so that every element in the string ends with a comma.
The pattern will ensure, that any element in the comma separated string contains the value of the name column. Since you want the opposite result, we use where not ...
Assuming "myname" does not appear twice between two commas, you can count the commas and "myname"s:
where (length(names) - length(replace(names, ','))) >=
length(names) - length(replace(names, 'myname', '12345'))
This answer started off giving an incorrect REGEXP solution. But the best thing to do here would be to fix your data model, such that each name in the names column is actually on a separate row:
name | names
myname | xmyname
myname | myname
myname | mynamey
somename | x
somename | myname
somename | mynamey
Now we can do a simple aggregation query to answer your question:
SELECT name
FROM yourTable
GROUP BY name
HAVING COUNT(CASE WHEN names NOT LIKE CONCAT('%', name, '%') THEN 1 END) > 0;
Demo
You can approach this using the following SQL query
SELECT
name, names
FROM
`tablename`
WHERE
(LENGTH(names) - LENGTH(REPLACE(names, ',', '')) + 1)
=
ROUND (
(
LENGTH(names)
- LENGTH( REPLACE ( names, name, "") )
)/ LENGTH(name)
);
Explanation:-
This Will give you how many words are separated with ,
(LENGTH(names) - LENGTH(REPLACE(names, ',', '')) + 1) -
Following is matching the name in each row and returning how many times it found
ROUND (
(
LENGTH(names)
- LENGTH( REPLACE ( names, name, "") )
) / LENGTH(name)
)
DEMO
I am trying to enhance a third part (awesome) django framework named django-watson and I currently need to make my way through a so far unknown mysql option, the MATCH (...) AGAINST (...).
So, I already know how to retrieve an exact phrase, which is doing:
SELECT *
FROM patient_db
WHERE MATCH ( Name, id_number )
AGAINST ('"exact phrase"' IN BOOLEAN MODE);
An I also know how to retrieve results that contain words from a list:
SELECT *
FROM patient_db
WHERE MATCH ( Name, id_number )
AGAINST ('+keyword1 +keyword2' IN BOOLEAN MODE);
But I need a third option, which is mixing the two above quoted. I'd like to do something like the google search: "exact phrase" +keyword1 +keyword2.
_PS: when I search for "exact phrase" -keyword1 it works exactly as desired _
Any Ideas?
Try this.
SELECT *
FROM patient_db
WHERE MATCH ( Name, id_number )
AGAINST ('+keyword1 +keyword2' IN BOOLEAN MODE)
OR MATCH ( Name, id_number )
AGAINST ('"exact phrase"' IN BOOLEAN MODE)
I'm constructing a database with photo's.
By using a boolean search I get the right images for the used search(phrase).
Now I have this problem:
In many of the descriptions the email address of the person or subject is marked. Like my.name#telenet.be
Now, when I want to find an image of the company Telenet, I get all the records that contain a email address of the provider. Who are of no imporatance of my search.
Here is a example of my search:
SELECT ID,auteur, title, datetaken, description,
keywords,thumbnail,URL,rel_date,showa,initialen,type
FROM archief
WHERE MATCH(description, keywords) AGAINST(' +telenet ' IN BOOLEAN
MODE) AND rechten <> '0' AND showa = '1' AND type >= '2' AND rel_date
<= '20120921' ORDER BY datetaken desc LIMIT 60'
in 'where clause'
What can I do to filter out the searchphrase containing an # symbol?
I think you can try REGEXP operator, in your cause it will look something like:
SELECT *
FROM archief
WHERE
description REGEXP '[^#]telenet' OR
keywords REGEXP '[^#]telenet'
This query should select all records which contain telenet string without preceding # in description or keywords fields.
SELECT *
FROM `thread`
WHERE forumid NOT IN (1,2,3) AND IF( LEFT( title, 1) = '#', 1, 0)
ORDER BY title ASC
I have this query which will select something if it starts with a #. What I want to do is if # is given as a value it will look for numbers and special characters. Or anything that is not a normal letter.
How would I do this?
If you want to select all the rows whose "title" does not begin with a letter, use REGEXP:
SELECT *
FROM thread
WHERE forumid NOT IN (1,2,3)
AND title NOT REGEXP '^[[:alpha:]]'
ORDER BY title ASC
NOT means "not" (obviously ;))
^ means "starts with"
[[:alpha:]] means "alphabetic characters only"
Find more about REGEXP in MySQL's manual.
it's POSSIBLE you can try to cast it as a char:
CAST('#' AS CHAR)
but i don't know if this will work for the octothorpe (aka pound symbol :) ) because that's the symbol for starting a comment in MySQL
SELECT t.*
FROM `thread` t
WHERE t.forumid NOT IN (1,2,3)
AND INSTR(t.title, '#') = 0
ORDER BY t.title
Use the INSTR to get the position of a given string - if you want when a string starts, check for 0 (possibly 1 - the documentation doesn't state if it's zero or one based).