Counting occurrences of a word in a single row - mysql

I have a search query that is able to sort results by relevance according to how many of the words from the query actually show up.
SELECT id,
thesis
FROM activity p
WHERE p.discriminator = 'opinion'
AND ( thesis LIKE '%gun%'
OR thesis LIKE '%crucial%' )
ORDER BY ( ( CASE
WHEN thesis LIKE '%gun%' THEN 1
ELSE 0
end )
+ ( CASE
WHEN thesis LIKE '%crucial%' THEN 1
ELSE 0
end ) )
DESC
This query however, does not sort according to how many times 'gun' or 'crucial' show up. I want to make it so records with more occurrences of 'gun' show up above records with less occurrences. (I.E, add a point for every time gun shows up rather than adding a point because gun shows up at least once)

I might be wrong but without use of stored procedures or UDF You won't be able to count string occurrences. Here's sample stored function that counts substrings:
drop function if exists str_count;
delimiter |
create function str_count(sub varchar(255), str varchar(255)) RETURNS INTEGER
DETERMINISTIC NO SQL
BEGIN
DECLARE count INT;
DECLARE cur INT;
SET count = 0;
SET cur = 0;
REPEAT
SET cur = LOCATE(sub, str, cur+1);
SET count = count + (cur > 0);
UNTIL (cur = 0)
END REPEAT;
RETURN(count);
END|
You might want to change varchar(255) to varchar(65536) or TEXT. You can now use it in order by query:
SELECT id,
thesis
FROM activity p
WHERE p.discriminator = 'opinion'
AND ( thesis LIKE '%gun%'
OR thesis LIKE '%crucial%' )
ORDER BY STR_COUNT('gun',thesis) + STR_COUNT('crucial', thesis)
If Your dataset is large and performance is important for You I suggest to write custom UDF in C.

Depending on how your database is set up, you may find MySQL's full text indexing to be a better fit for your use case. It allows you to index fields and search for words in them, ordering the results by relevance related to the number of occurrences.
See the documentation here: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
This is a useful question that gives some examples, and may help: How can I manipulate MySQL fulltext search relevance to make one field more 'valuable' than another?
Finally, if full text searches aren't an option for you, the comment posted by Andrew Hanna on the string functions reference may do the trick: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html (search the page for "Andrew Hanna"). They create a function on the server which can count the number of times a string occurs.
Hope this helps.

Related

Max of Max on MYSQL

I'm making an Acyclic Graph database.
TABLE Material (id_item,id_collection,...)
PRIMARY KEY(id_item,id_collection)
(item can be collection itself, item can be collection of collection)
My constraint is id_collection > id_item (to prevent some cycle - 1st step)
So before inserting i need to know "Max(Max(id_item), Max(id_collection))"
I can get the 2 values them by doing. But can't get max of this :
SELECT max(id_collection)
FROM material
UNION
SELECT max(id_item)
FROM Material
I tried to do that aswell :
DELIMITER $$
CREATE PROCEDURE `findmax`
(
)
BEGIN
DECLARE max_item SMALLINT;
DECLARE max_collection SMALLINT;
DECLARE max_of_both SMALLINT;
SELECT MAX(id_item)
INTO max_item
FROM material
SELECT MAX(id_collection)
INTO max_collection
FROM material
SET max_of_both = MAX(max_item, max_collection)
END$$
DELIMITER ;
I'm running out of Gas. Anyone got an idea plz?
Best regards,
Falt
N.B. 2 useful sources about acyclic graph :
Database Soup : Trigger prevent cycles in PostgreSQL
CodeProject : Acyclic Graph Modelisation
You should be able to use the GREATEST() function in MySQL.
Try this:
SELECT GREATEST(
(SELECT MAX(id_item) FROM material),
(SELECT MAX(id_collection) FROM material));
This will select the largest item, whether that's the MAX(id_item) or MAX(id_collection).
EDIT
Something that may look a little cleaner. The GREATEST() function takes the largest of the parameters that it is passed, so if you used it by itself it would return however many rows are in the table, but selecting the id_item or id_collection, which ever is larger. That being said, you can wrap GREATEST inside of MAX() to achieve the same task:
SELECT MAX(GREATEST(id_item, id_collection))
FROM material;
Here is an SQL Fiddle example with both.

Mapping MySql views to JPA Entitites, which unique id to use?

So I'v been using views instead of result queries as entities in my project and I know I'm not alone, so, my question:
What do you use to act as and #Id when working with views? Sometime the answer to that question will be trivial, but sometimes when you don't have a unique field that stands out, what do you guys do?
Right now I'm including more fields that I need in a particular view, so I can have a mix of fields that are unique together, and I use the #Id annotation on each of those, and so far it's been working great.
It seems to be so contextual, I'v been asking myself if there is a more standard way of doing it.
I don't think there is a standard way, but here is the approach that seems worths trying .
Idea is to generate unique "id" values (analog of rownum ) on the fly for the view . A bit modified version of function from Create a view with column num_rows - MySQL (modification done in order to reset rownum):
delimiter //
CREATE FUNCTION `func_inc_var_session`( val int) RETURNS int
NO SQL
NOT DETERMINISTIC
begin
if val = 0 THEN set #var := -1; end if;
SET #var := IFNULL(#var,0) + 1;
return #var;
end
//
Say we have a view definition (oversimplified for illustration purposes)
CREATE VIEW v_test1
SELECT a.field1
FROM test_table a
Modifying it to
CREATE VIEW v_test1
SELECT a.field1, func_inc_var_session(0) as rownum
FROM test_table a
would do the job; however, running select * from v_test within one session multiple times will give you sequential rownums, e.g. first time it starts with 1, second time with number of records in the view, etc.
To reset rownum I create another view (because of mysql view limitation - it cannot have subquery in FROM ) :
CREATE VIEW v_reset AS SELECT func_inc_var_session(1) ;
Now we can do
CREATE VIEW v_test1
SELECT a.field1, func_inc_var_session(0) as rownum
FROM test_table a, v_reset
(FROM clause processed first, func_inc_var_session(1) will be executed just once during the query, so it will reset rownum) .
I hope it helps.

Save data from table into a variable and use it inside a function (make a data set)

Basically I want to make a data set like in PHP, where I can store the return of a select statement in a variable and then use it to do logical decisions.
here is what I am trying:
DROP FUNCTION cc_get_balance(date);
CREATE OR REPLACE FUNCTION cc_get_balance(theDate date) RETURNS TABLE(balance numeric(20,10), rate numeric(20,10), final_balance numeric(20,10)) AS $$
DEClARE
currency1_to_EUR numeric(20,10);
currency2_to_EUR numeric(20,10);
table_ret record;
BEGIN
currency1_to_EUR := (SELECT rate FROM cc_getbalancesfordatewitheurs(theDate) WHERE from_currency = 'currency1' AND to_currency = 'EUR');
currency2_to_EUR := (SELECT rate FROM cc_getbalancesfordatewitheurs(theDate) WHERE from_currency = 'currency2' AND to_currency = 'EUR');
SELECT * INTO table_ret FROM cc_getbalancesfordatewitheurs(theDate);
END;
$$ LANGUAGE 'plpgsql';
SELECT * FROM cc_get_balance('2014-02-15'::date);
I don't know if this is right. I want to be able to use table_ret as a data set like:
select * from table_ret ...
So I don't have to make a lot queries to the database. I have looked for examples doing this and have not found anything like what I need or want to do.
the version is 9.3.4, cc_getbalancesfordatewitheurs() returns a table with columns from_currency, to_currency, rate, exchange, balance and converted_amount. with around 30 rows. I need to run through the to_currency column and run some other conversions based on the currency list in the column. So I did not want to have to query the database 30 times for the conversions. All the data I need is collected together in the table returned by cc_getbalancesfordatewitheurs().
cc_get_balance() should return all the rows found in the table from the other function along with a column that does a final conversion of the to_currency into EUR
Generally, there is not "table variable". You could use a cursor or a temporary table.
Better yet, use the implicit cursor of a FOR loop.
Even better, still, if possible, do it all in a single set-based operation. A query.
Related example:
Cursor based records in PostgreSQL

Search comma separated string in Column t-sql

In a property mgt system, I'm saving buyers based on their preferences. Say a person interested in houses which has more than 2 & less than 4. So I saved it as 2,3,4. Please see the attachment.
When searching, say someone searching the buyers who are interested in houses which has more than 2, how should i write the select statement to check the bedroom column.
If someone search buyers who are interested in houses which has more than 2 bathrooms; what could be the select statement?
I still think a min/max is the much better table structure, but if you can't change it, try this.
First, let's come up with some base rules. If any of these is violated, then the final answer will need modification (and will probably be more complicated).
If a string contains + anywhere in it, the maximum is infinity.
The + can only occur at the end of the string.
The list will always be comma separated integers.
The smallest number will always be first in the list.
The largest number will always be last in the list.
If all those are true, then after a LOT of work, I think I have something you can use. The basic idea was to come up with a pair of functions that will get the min/max values out of your strings. Once you have these functions, you can use them in WHERE clauses.
See this SQL Fiddle for starters. Function definitions on the left, a sample query to give you the gist of how they work on the right.
CREATE FUNCTION dbo.list_min(#list_str AS VARCHAR(MAX))
RETURNS INT
WITH RETURNS NULL ON NULL INPUT
AS BEGIN
DECLARE #comma_index INT;
SET #comma_index = CHARINDEX(',', #list_str);
DECLARE #result INT;
IF (0 < #comma_index)
SET #result = CONVERT(INT, LEFT(#list_str, #comma_index - 1));
ELSE
SET #result = CONVERT(INT, REPLACE(#list_str, '+', ''));
RETURN #result;
END;
and
CREATE FUNCTION dbo.list_max(#list_str AS VARCHAR(MAX))
RETURNS INT
WITH RETURNS NULL ON NULL INPUT
AS BEGIN
IF (#list_str LIKE '%+')
RETURN 2147483647; -- Max INT
DECLARE #comma_index INT;
SET #comma_index = CHARINDEX(',', REVERSE(#list_str));
DECLARE #result INT;
IF (0 < #comma_index)
SET #result = CONVERT(INT, RIGHT(#list_str, #comma_index - 1));
ELSE
SET #result = CONVERT(INT, #list_str);
RETURN #result;
END;
(If anyone can think of a way to get rid of that ridiculous result variable, please let me know. I was getting errors about the last statement "must be a return statement" when putting the return inside IF/ELSE, and I couldn't get a CASE syntax working.)
With these in hand, you can do queries like this:
SELECT *
FROM stuff
WHERE dbo.list_min(carspace) <= 2 AND 2 <= dbo.list_max(carspace)
which will only select your second row. (SQL Fiddle of this query.)
A third function you might find useful is one that gives you the max in the list but ignores the +. To do that, it's essentially the list_max function, but without the IF block that checks for +. The get that functionality, you might want to just remove the + check from list_max, and create another function that checks for + and calls list_max if there's no +.
I'm not sure about the performance characteristics here. I imagine they aren't great. You might want to consider some function based indexing if you have a large amount of data to search through.
Good luck. Hope this helps.
Wouldn't it make more sense to store a min and max and then query with <= and >=?
Your table design is not optimal. You can simply store the number of bedrooms as an integer. Instead of "1,2,3,4", you would be looking at just 4 in that case.
To answer the particular question though, you can do the replace trick to count the number of commas in the column as such:
SELECT * FROM myTable WHERE LEN(col) - LEN(REPLACE(col, ',','')) >= someNumber

Count occurrences of a word in a row in MySQL

I'm making a search function for my website, which finds relevant results from a database. I'm looking for a way to count occurrences of a word, but I need to ensure that there are word boundaries on both sides of the word ( so I don't end up with "triple" when I want "rip").
Does anyone have any ideas?
People have misunderstood my question:
How can I count the number of such occurences within a single row?
This is not the sort of thing that relational databases are very good at, unless you can use fulltext indexing, and you have already stated that you cannot, since you're using InnoDB. I'd suggest selecting your relevant rows and doing the word count in your application code.
You can try this perverted way:
SELECT
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC
This query can be very slow
It looks pretty ugly
REPLACE() is case-sensitive
You can overcome the issue of mysql's case-sensitive REPLACE() function by using LOWER().
Its sloppy, but on my end this query runs pretty fast.
To speed things along I retrieve the resultset in a select which I have declared as a derived table in my 'outer' query. Since mysql already has the results at this point, the replace method works pretty quickly.
I created a query similar to the one below to search for multiple terms in multiple tables and multiple columns. I obtain a 'relevance' number equivalent to the sum of the count of all occurrances of all found search terms in all columns searched
SELECT DISTINCT (
((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('there'),''))) / length('there'))
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('there'),''))) / length('there'))
+ ((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('another'),''))) / length('another'))
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('another'),''))) / length('another'))
) as relevance,
x.ent_type,
x.ent_id,
x.this_id as anchor,
page.page_name
FROM (
(SELECT
'Foo' as ent_type,
sp.sp_id as ent_id,
sp.page_id as this_id,
sp.title as ent_title,
sp.content as ent_content,
sp.page_id as page_id
FROM sp
WHERE (sp.title LIKE '%there%' OR sp.content LIKE '%there%' OR sp.title LIKE '%another%' OR sp.content LIKE '%another%' ) AND (sp_content.title NOT LIKE '%goes%' AND sp_content.content NOT LIKE '%goes%')
) UNION (
[search a different table here.....]
)
) as x
JOIN page ON page.page_id = x.page_id
WHERE page.rstatus = 'ACTIVE'
ORDER BY relevance DESC, ent_title;
Hope this helps someone
-- Seacrest out
create a user defined function like this and use it in your query
DELIMITER $$
CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
RETURNS INT
BEGIN
DECLARE cnt INT DEFAULT 0;
DECLARE result INT DEFAULT 1;
WHILE (result > 0) DO
SET result = INSTR(myStr, myword);
IF(result > 0) THEN
SET cnt = cnt + 1;
SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
END IF;
END WHILE;
RETURN cnt;
END$$
DELIMITER ;
Hope it helps
Refer This
Something like this should work:
select count(*) from table where fieldname REGEXP '[[:<:]]word[[:>:]]';
The gory details are in the MySQL manual, section 11.4.2.
Something like LIKE or REGEXP will not scale (unless it's a leftmost prefix match).
Consider instead using a fulltext index for what you want to do.
select count(*) from yourtable where match(title, body) against ('some_word');
I have used the technique as described in the link below. The method uses length and replace functions of MySQL.
Keyword Relevance
If you want a search I would advise something like Sphinx or Lucene, I find Sphinx (as an independent full text indexer) to be a lot easier to set up and run. It runs fast, and generates the indexes very fast. Even if you were using MyISAM I would suggest using it, it has a lot more power than a full text index from MyISAM.
It can also integrate (somewhat) with MySQL.
It depends on what DBMS you are using, some allow writing UDFs that could do this.