FULL TEXT only select if at least X words match - mysql

SELECT source, id
FROM memory_row
WHERE memory_id =10
AND MATCH(source)
AGAINST ('girl*, appears*, cool*, pragmatic*, things*, first*, glance*, actually*, warm*,
trusting*, created*, Design*, Children*, Togetsu*, existed*, solely*, activate*, Strings*,
experienced*, emotional*, damage*, young*, from*, experiments*, conducted*, her*, cruel*,
researchers*, sent*, Randall*, family*, treatment*, after*, sealing*, memories*, small*, world*,
older*, adoptive*, sister*, Naomi*')
The above query returns results in which only a few words match. I want to only return results that contain at least X of the terms that are being matched against. In the example above, that number can be 10. That means that the column must contain at least 10 fulltext matches to be returned.
How can I do this?
EDIT
One answer suggested the following. I get an error, "Incorrect arguments to AGAINST".
select m.*
from memory_row m
where
memory_id = 10
and (
select count(*)
from (
select 'girl*' word
union all select 'appears*'
union all select 'actually*'
union all select 'girl*'
union all select 'cool*'
union all select 'pragmatic*'
union all select 'things*'
union all select 'first*'
union all select 'glance*'
union all select 'actually*'
) w
where match(m.source) against(w.word)
) >= 5

I am unsure that there is an easy way to do what you want. You might need to enumerate the values as rows, and then use a correlated subquery to compute the count of matches:
select m.*
from memory_row m
where
memory_id = 10
and (
select count(*)
from (
select 'girl*' word
union all select 'appears*'
...
) w
where match(m.source) against(w.word)
) >= ?
Where the qestion mark represents the minimum number of rows that should match.
Or, in very recent versions of MySQL:
select m.*
from memory_row m
where
memory_id = 10
and (
select count(*)
from (values row('girl*'), row('appears*'), ...) w(word)
where match(m.source) against(w.word)
) >= ?

One method is to use match to get an initial set of documents. And then additional logic afterwards:
SELECT mr.*
FROM (SELECT source, id
FROM memory_row
WHERE memory_id = 10 AND
MATCH(source) AGAINST ('girl*, appears*, cool*, pragmatic*, things*, first*, glance*, actually*, warm*,
trusting*, created*, Design*, Children*, Togetsu*, existed*, solely*, activate*, Strings*,
experienced*, emotional*, damage*, young*, from*, experiments*, conducted*, her*, cruel*,
researchers*, sent*, Randall*, family*, treatment*, after*, sealing*, memories*, small*, world*,
older*, adoptive*, sister*, Naomi*')
) mr
WHERE ( (source like '%girl%') +
(source like '%actually%') +
. . .
) >= 10;
Note: This is not exactly the same logic, because it is just looking for strings. If you want more precise logic, you can use regular expressions, but that might not be necessary.

Related

union all two table but diff number of column

select count(*) as total FROM ( SELECT * FROM database1.orders WHERE number LIKE "11111111111111111" UNION ALL SELECT * FROM database2.orders WHERE number LIKE "11111111111111111" ) AS b
but i got error :
The used SELECT statements have a different number of columns
because run SELECT * FROM database2.orders WHERE number LIKE "11111111111111111" give me a result is null.
How to merge it with a query because with a query to help me process the pagination
Thank for helps !
Just do the aggregation before the union all:
select sum(cnt) as total
FROM ((SELECT count(*) as cnt
FROM database1.orders
WHERE number LIKE '11111111111111111'
)
UNION ALL
(SELECT count(*) as cnt
FROM database2.orders
WHERE number LIKE '11111111111111111'
)
) t;
Note I changed the string delimiter to be a single quote rather than a double quote. It is good practice to use single quotes for string and date constants (and nothing else).
By the way, you can also do this using a join:
select o1.cnt1, o2.cnt1, (o1.cnt1 + o2.cnt1) as total
FROM (SELECT count(*) as cnt1
FROM database1.orders
WHERE number LIKE '11111111111111111'
) o1 cross join
(SELECT count(*) as cnt2
FROM database2.orders
WHERE number LIKE '11111111111111111'
) o2;
This makes it easier to get the individual counts for the two databases.
The orders table in database1 probably has a different number of columns than the table by the same name in database2.
Instead of using select *, select the columns you're interested in, like select userid, productid, deliveryaddress, .... Make sure you specify the same columns in both parts of the union.
For a count(*), you could choose no columns at all, and select the value 1 for each row, like:
select count(*)
from (
select 1
from database1.orders
where number like '111'
union all
select 1
from database2.orders
where number like '111'
) as SubQueryAlias
Or you can add the result of two subqueries without a union:
select (
select count(*)
from database1.orders
where number like '111'
)
+
(
select count(*)
from database2.orders
where number like '111'
)

Opposite function of find_in_set in mysql

SELECT find_in_set("1","1,2,3,4,5"); //return 1
Is there any function in mysql that can return non matching value from set of value like
SELECT find_in_set("1","1,2,3,4,5");
Expected output is : 2,3,4,5
Help me If any function.
As per my knowledge, i dont think there is any function exist in mysql that gives you result except given input.
But, You can get what you want by doing this..
you can modify it as per your requirement.
SELECT * FROM table_name WHERE NOT find_in_set("1","1,2,3,4,5");
You can always use replace():
select replace(concat(',', '1,2,3,4,5', ','), concat(',', '1', ','), '')
This puts delimiters at the beginning and end of both lists so 10 won't be confused with 100. If this isn't a problem, then you don't need the delimiters.
Here's how you can do it by using SUBSTRING_INDEX.
But make sure to put "," before and after the value otherwise it will split by value which is available in different position i.e if look for 12 it will also split where value is 123.
SELECT CONCAT(SUBSTRING_INDEX("11,22,33,44,55", ',22,', 1), ",", SUBSTRING_INDEX("11,22,33,44,55", ',22,', -1));
If you are fine with this solution then you should create a function eg. FIND_NOT_IN_SET and pass two params and return final string.
I also posted another answer on How to select all the 'not' part against the 'in' set in MySQL? using prepared statement.
Using a series of UNIONed constants instead of a single value and a comma separated field:-
SELECT a.i
FROM
(
SELECT 1 i UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5
) a
LEFT OUTER JOIN
(
SELECT 1 i
)
ON a.i = b.i
WHERE b.i IS NULL
Or if only ever 1 value you are looking for:-
SELECT a.i
FROM
(
SELECT 1 i UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5
) a
WHERE a.i != '1'

MySQL select from custom set and compare with table data

Hi I'm trying to solve which elements doesn't exists in my database. In order to do so I want to compare list of integers (output from external script) with data in table. How to do such thing like:
SELECT * FROM (1,1,2,3,5,8,13...) l WHERE l NOT IN (select id from table1);
This is probably best done with a left outer join. But, your problem is creating the table of constants:
SELECT *
FROM (select 1 as id union all select 2 union all select 3 union all select 5 union all
select 8 union all select 13 union all select 21 . . .
) ids
where ids.id NOT IN (select id from table1);
This can have odd behavior, if table1.id is ever NULL. The following works more generally:
SELECT *
FROM (select 1 as id union all select 2 union all select 3 union all select 5 union all
select 8 union all select 13 union all select 21 . . .
) ids left outer join
table1 t1
on ids.id = t1.id
where t1.id is null;
EDIT:
The size of a MySQL query is dictated by the parameter max_packet_size (see here). The most recent version has a limit of 1 Gbyte. You should be able to fit 18,000 rows of:
select <n> union all
into that limit, quite easily. Gosh, I don't even think it would be 1 megabyte. I would say, though, that passing a list of 18,000 ids through the application seems inefficient. It would be nice if one database could just pull the data from the other database, without going through the application.
If your set to compare is huge I'd recommend you to create a temporary table myids with the only column id, put there all your 18K values and run query like that:
select id from myids where myids.id not in (select id from table1);

Changing a Query with a numbered result set (with gaps,) to return result with no gaps, containing every number.

I have a select statement: select a, b, [...]; which returns the results:
a|b
---------
1|8688798
2|355744
4|457437
7|27834
I want it to return:
a|b
---------
1|8688798
2|355744
3|0
4|457437
5|0
6|0
7|27834
An example query that does not do what I would like, since it does not have the gap numbers:
select
sub.num_of_ratings,
count(sub.rater)
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings;
Explanation of the query:
If a user rates another user, the rating is listed in the table ratings and the id of the rating user is kept in the field rater_id. Effectively I check for all users who are referred to in ratings and count how many ratings records I find for that user, which is rater / num_of_ratings, and then I use this result to find how many users have rated a given number of times.
At the end I know how many users rated once, how many users rated twice, etc. My problem is that the numbers for count(sub.rater) start fine from 1,2,3,4,5... However, for bigger numbers there are gaps. This is because there might be one user who rated 1028 times - but no user who rated 1027 times.
I don't want to apply stored procedures looping over the result or something like that. Is it possible to fill those gaps in the result without using stored procedures, looping, or creating temporary tables?
If you have a sequence of numbers, then you can do a JOIN with that table and fill in the gaps properly.
You can check out this questions on how to get the sequence:
generate an integer sequence in MySQL
Here is one of the answers posted that might be easily used with the limitation that generates numbers from 1 to 10,000:
SELECT #row := #row + 1 as row FROM
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t2,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t3,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t4,
(SELECT #row:=0) t5
Using a sequence of numbers, you can join your result set. For instance, assuming your number list is in a table called numbersList, with column number:
Select number, Count
from
numbersList left outer join
(select
sub.num_of_ratings,
count(sub.rater) as Count
from
(
select
r.rater_id as rater,
count(r.id) as num_of_ratings
from ratings r
group by rater
) as sub
group by num_of_ratings) as num
on num.num_of_ratings=numbersList.number
where numbersList.number<max(num.num_of_ratings)
Your numbers list must be larger than your largest value, obviously, and the restriction will allow it to not have all numbers up to the maximum. (If MySQL does not allow that type of where clause, you can either leave the where clause out to list all numbers up to the maximum, or modify the query in various ways to achieve the same result.)
#mazzucci: the query is too magical and you are not actually explaining the query.
#David: I cannot create a table for that purpose (as stated in the question)
Basically what I need is a select that returns a gap-less list of numbers. Then I can left join on that result set and treat NULL as 0.
What I need is an arbitrary table that keeps more records than the length of the final list. I use the table user for that in the following example:
select #row := #row + 1 as index
from (select #row := -1) r, users u
limit 101;
This query returns a set of the numbers von 0 to 100. Using it as a subquery in a left join finally fills the gap.
users is just a dummy to keep the relational engine going and hence producing the numbers incrementally.
select t1.index as a, ifnull(t2.b, 0) as b
from (
select #row := #row + 1 as index
from (select #row := 0) r, users u
limit 7
) as t1
left join (
select a, b [...]
) as t2
on t1.index = t2.a;
I didn't try this very query live, so have merci with me if there is a little flaw. but technically it works. you get my point.
EDIT:
just used this concept to gain a gapless list of dates to left join measures onto it:
select #date := date_add(#date, interval 1 day) as date
from (select #date := '2010-10-14') d, users u
limit 700
starts from 2010/10/15 and iterates 699 more days.

Counting word occurrences in a table column

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.
If there are 2 rows with these fields:
"I like to eat bananas"
"I don't like to eat like a monkey"
I want to get
word | count()
---------------
like 3
eat 2
to 2
i 2
a 1
Any idea? I am using MySQL 5.2.
#Elad Meidar, I like your question and I found a solution:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1
First we do a query to extract all words as explained here by #peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE
I would recommend not to do this in SQL at all. You're loading DB with something that it isn't best at. Selecting a group of rows and doing frequency calculation on the application side will be easier to implement, will work faster and will be maintained with less issues/headaches.
You can try this perverted-a-little way:
SELECT
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC
This query can be very slow. Also, it looks pretty ugly.
I think you should do it like indexing, with additional table.
Whenever u create, update, or delete a row in your original table, you should update your indexing table. That indexing table should have the columns: word, and the number of occurrences.
I think you are trying to do too much with SQL if all the words are in one field of each row. I recommend to do any text processing/counting with your application after you grab the text fields from the db.