I'm making a search function for my website, which finds relevant results from a database. I'm looking for a way to count occurrences of a word, but I need to ensure that there are word boundaries on both sides of the word ( so I don't end up with "triple" when I want "rip").
Does anyone have any ideas?
People have misunderstood my question:
How can I count the number of such occurences within a single row?
This is not the sort of thing that relational databases are very good at, unless you can use fulltext indexing, and you have already stated that you cannot, since you're using InnoDB. I'd suggest selecting your relevant rows and doing the word count in your application code.
You can try this perverted way:
SELECT
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC
This query can be very slow
It looks pretty ugly
REPLACE() is case-sensitive
You can overcome the issue of mysql's case-sensitive REPLACE() function by using LOWER().
Its sloppy, but on my end this query runs pretty fast.
To speed things along I retrieve the resultset in a select which I have declared as a derived table in my 'outer' query. Since mysql already has the results at this point, the replace method works pretty quickly.
I created a query similar to the one below to search for multiple terms in multiple tables and multiple columns. I obtain a 'relevance' number equivalent to the sum of the count of all occurrances of all found search terms in all columns searched
SELECT DISTINCT (
((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('there'),''))) / length('there'))
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('there'),''))) / length('there'))
+ ((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('another'),''))) / length('another'))
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('another'),''))) / length('another'))
) as relevance,
x.ent_type,
x.ent_id,
x.this_id as anchor,
page.page_name
FROM (
(SELECT
'Foo' as ent_type,
sp.sp_id as ent_id,
sp.page_id as this_id,
sp.title as ent_title,
sp.content as ent_content,
sp.page_id as page_id
FROM sp
WHERE (sp.title LIKE '%there%' OR sp.content LIKE '%there%' OR sp.title LIKE '%another%' OR sp.content LIKE '%another%' ) AND (sp_content.title NOT LIKE '%goes%' AND sp_content.content NOT LIKE '%goes%')
) UNION (
[search a different table here.....]
)
) as x
JOIN page ON page.page_id = x.page_id
WHERE page.rstatus = 'ACTIVE'
ORDER BY relevance DESC, ent_title;
Hope this helps someone
-- Seacrest out
create a user defined function like this and use it in your query
DELIMITER $$
CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
RETURNS INT
BEGIN
DECLARE cnt INT DEFAULT 0;
DECLARE result INT DEFAULT 1;
WHILE (result > 0) DO
SET result = INSTR(myStr, myword);
IF(result > 0) THEN
SET cnt = cnt + 1;
SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
END IF;
END WHILE;
RETURN cnt;
END$$
DELIMITER ;
Hope it helps
Refer This
Something like this should work:
select count(*) from table where fieldname REGEXP '[[:<:]]word[[:>:]]';
The gory details are in the MySQL manual, section 11.4.2.
Something like LIKE or REGEXP will not scale (unless it's a leftmost prefix match).
Consider instead using a fulltext index for what you want to do.
select count(*) from yourtable where match(title, body) against ('some_word');
I have used the technique as described in the link below. The method uses length and replace functions of MySQL.
Keyword Relevance
If you want a search I would advise something like Sphinx or Lucene, I find Sphinx (as an independent full text indexer) to be a lot easier to set up and run. It runs fast, and generates the indexes very fast. Even if you were using MyISAM I would suggest using it, it has a lot more power than a full text index from MyISAM.
It can also integrate (somewhat) with MySQL.
It depends on what DBMS you are using, some allow writing UDFs that could do this.
Related
I've tried looking it up, and while I think this should be possible I can't seem to find the answer I need anywhere.
I need to lookup a date from one table, then store it for use in a following query.
Below is statements that should work, with my setting the variable (which I know won't work, but I'm unsure the best way to do/show it otherwise - bar maybe querying it twice inside the if statement.)
I'm then wanting to in the latter statement, use either the date given in the second query, or if the date from the first query (that I'm thinking to set as a variable) is newer, use that instead.
startDateVariable = (SELECT `userID`, `startDate`
FROM `empDetails`
WHERE `userID` = 1);
SELECT `userID`, SUM(`weeksGROSS`) AS yearGROSS
FROM `PAYSLIP`
WHERE `date` <= "2021-11-15"
AND `date` >= IF( "2020-11-15" > startDateVariable , "2020-11-15" , startDateVariable )
AND `userID` IN ( 1 )
GROUP BY `userID`
Naturally all dates given in the query ("2021-11-15" etc) would be inserted dynamically in the prepared statement.
Now while I've set the userID IN to just query 1, it'd be ideal if I can lookup multiple users this way at once, though I can accept that I may need to make an individual query per user doing it this way.
Much appreciated!
So turns I was going about this the wrong way, looks like the best way to do this or something similar is by using SQL JOIN
This allows you to query the tables as if they are one.
I also realised rather then using an IF, i could simply make sure i was looking up newer or equal to both the date given and the start date.
Below is working as required. And allows lookup of multiple users at once as wanted.
SELECT PAYSLIP.userID, employeeDetails.startDate, SUM(PAYSLIP.weeksGROSS) AS yearGROSS
FROM PAYSLIP
INNER JOIN employeeDetails ON employeeDetails.userID=PAYSLIP.userID
WHERE PAYSLIP.date <= "2021-11-15"
AND PAYSLIP.date >= "2020-11-15"
AND PAYSLIP.date >= employeeDetails.startDate
AND PAYSLIP.userID IN ( 1,2,8 )
GROUP BY PAYSLIP.userID
See here for more usage examples: https://www.w3schools.com/sql/sql_join.asp
However along the lines of my particular question, it's possible to store variables. I.E.
SET #myvar= 'Example showing how to declare variable';
Then use it in the SQL statement by using
#myvar where you want the variable to go.
So I'v been using views instead of result queries as entities in my project and I know I'm not alone, so, my question:
What do you use to act as and #Id when working with views? Sometime the answer to that question will be trivial, but sometimes when you don't have a unique field that stands out, what do you guys do?
Right now I'm including more fields that I need in a particular view, so I can have a mix of fields that are unique together, and I use the #Id annotation on each of those, and so far it's been working great.
It seems to be so contextual, I'v been asking myself if there is a more standard way of doing it.
I don't think there is a standard way, but here is the approach that seems worths trying .
Idea is to generate unique "id" values (analog of rownum ) on the fly for the view . A bit modified version of function from Create a view with column num_rows - MySQL (modification done in order to reset rownum):
delimiter //
CREATE FUNCTION `func_inc_var_session`( val int) RETURNS int
NO SQL
NOT DETERMINISTIC
begin
if val = 0 THEN set #var := -1; end if;
SET #var := IFNULL(#var,0) + 1;
return #var;
end
//
Say we have a view definition (oversimplified for illustration purposes)
CREATE VIEW v_test1
SELECT a.field1
FROM test_table a
Modifying it to
CREATE VIEW v_test1
SELECT a.field1, func_inc_var_session(0) as rownum
FROM test_table a
would do the job; however, running select * from v_test within one session multiple times will give you sequential rownums, e.g. first time it starts with 1, second time with number of records in the view, etc.
To reset rownum I create another view (because of mysql view limitation - it cannot have subquery in FROM ) :
CREATE VIEW v_reset AS SELECT func_inc_var_session(1) ;
Now we can do
CREATE VIEW v_test1
SELECT a.field1, func_inc_var_session(0) as rownum
FROM test_table a, v_reset
(FROM clause processed first, func_inc_var_session(1) will be executed just once during the query, so it will reset rownum) .
I hope it helps.
I have a search query that is able to sort results by relevance according to how many of the words from the query actually show up.
SELECT id,
thesis
FROM activity p
WHERE p.discriminator = 'opinion'
AND ( thesis LIKE '%gun%'
OR thesis LIKE '%crucial%' )
ORDER BY ( ( CASE
WHEN thesis LIKE '%gun%' THEN 1
ELSE 0
end )
+ ( CASE
WHEN thesis LIKE '%crucial%' THEN 1
ELSE 0
end ) )
DESC
This query however, does not sort according to how many times 'gun' or 'crucial' show up. I want to make it so records with more occurrences of 'gun' show up above records with less occurrences. (I.E, add a point for every time gun shows up rather than adding a point because gun shows up at least once)
I might be wrong but without use of stored procedures or UDF You won't be able to count string occurrences. Here's sample stored function that counts substrings:
drop function if exists str_count;
delimiter |
create function str_count(sub varchar(255), str varchar(255)) RETURNS INTEGER
DETERMINISTIC NO SQL
BEGIN
DECLARE count INT;
DECLARE cur INT;
SET count = 0;
SET cur = 0;
REPEAT
SET cur = LOCATE(sub, str, cur+1);
SET count = count + (cur > 0);
UNTIL (cur = 0)
END REPEAT;
RETURN(count);
END|
You might want to change varchar(255) to varchar(65536) or TEXT. You can now use it in order by query:
SELECT id,
thesis
FROM activity p
WHERE p.discriminator = 'opinion'
AND ( thesis LIKE '%gun%'
OR thesis LIKE '%crucial%' )
ORDER BY STR_COUNT('gun',thesis) + STR_COUNT('crucial', thesis)
If Your dataset is large and performance is important for You I suggest to write custom UDF in C.
Depending on how your database is set up, you may find MySQL's full text indexing to be a better fit for your use case. It allows you to index fields and search for words in them, ordering the results by relevance related to the number of occurrences.
See the documentation here: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
This is a useful question that gives some examples, and may help: How can I manipulate MySQL fulltext search relevance to make one field more 'valuable' than another?
Finally, if full text searches aren't an option for you, the comment posted by Andrew Hanna on the string functions reference may do the trick: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html (search the page for "Andrew Hanna"). They create a function on the server which can count the number of times a string occurs.
Hope this helps.
I am trying to query a table in mysql based on the length of a string in a specific column. I know mysql has a function called LENGTH(), but that returns the length of the string. I want to be able to pull data based on the result of the LENGTH() function.
Example:
SELECT * table WHERE LENGTH(word) = 6
of course that does not work. I read through http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function%5Flength but could not find anything to help me.
yes I could make something in PhP to accomplish this, but I would like to do it at the query level.
Any help?
Try:
SELECT *
FROM table
WHERE LENGTH(RTRIM(word)) = 6
I believe you wanted to use query SELECT * FROM tableName WHERE LENGTH(word) = 6; (assuming that the word is name of column in tableName).
This is very unfortunate solution on large tables, you should create new column and use UPDATE tableName SET wordLength = LENGTH( word).
From working on this specific situation, it was news to me that the logic operators are not short circuited in SQL.
I routinely do something along these lines in the where clause (usually when dealing with search queries):
WHERE
(#Description IS NULL OR #Description = myTable.Description)
Which, even if it's not short-circuited in this example, doesn't really matter. However, when dealing with the fulltext search functions, it does matter.. If the second part of that query was CONTAINS(myTable.Description, #Description), it wouldn't work because the variable is not allowed to be null or empty for these functions.
I found out the WHEN statements of CASE are executed in order, so I can change my query like so to ensure the fulltext lookup is only called when needed, along with changing the variable from null to '""' when it is null to allow the query to execute:
WHERE
(CASE WHEN #Description = '""' THEN 1 WHEN CONTAINS(myTable.Description, #Description) THEN 1 ELSE 0 END = 1)
The above code should prevent the full-text query piece from executing unless there is actually a value to search with.
My question is, if I run this query where #Description is '""', there is still quite a bit of time in the execution plan spent dealing with clustered index seeks and fulltextmatch, even though that table and search does not end up being used at all: is there any way to avoid this?
I'm trying to get this out of a hardcoded dynamic query and into a stored procedure, but if the procedure ends up being slower, I'm not sure I can justify it.
It's not ideal, but maybe something like this would work:
IF #Description = ''
BEGIN
SELECT ...
END
ELSE
BEGIN
SELECT ...
WHERE CONTAINS(mytable.description, #Description)
END
That way you avoid mysql and also running the FT scan when it's not needed.
As a few general notes, I usually find CONTAINSTABLE to be a bit faster. Also, since the query plan is going to be very different whether you're using my solution or yours, watch out for parameter sniffing. Parameter sniffing is when the optimizer builds a plan based on a passed in specific parameter value.
In case anyone else runs into a scenario like this, this is what I ended up doing, which is pretty close to what M_M was getting at; I broke away the full-text pieces and placed them behind branches:
DECLARE #TableBfullSearch TABLE (TableAId int)
IF(#TableBSearchInfo IS NOT NULL)
INSERT INTO #TableBfullSearch
SELECT
TableAId
FROM
TableB
WHERE
...(fulltext search)...
DECLARE #TableCfullSearch TABLE (TableAId int)
IF(#TableCSearchInfo IS NOT NULL)
INSERT INTO #TableCfullSearch
SELECT
TableAId
FROM
TableC
WHERE
...(fulltext search)...
--main query with this addition in the where clause
SELECT
...
FROM
TableA
WHERE
...
AND (#TableBSearchInfo IS NULL OR TableAId IN (SELECT TableAId FROM #TableBfullSearch))
AND (#TableCSearchInfo IS NULL OR TableAId IN (SELECT TableAId FROM #TableCfullSearch))
I think that's probably about as good as it'll get without some sort of dynamic query