I do have two tables.
Classifieds, containing a body text.
Keywords, containing keyword combinations like "silver ring"
Now I am trying to find out, how many exact matches are inside the text field for each keyword.
e.g:
chihuahua bilder 30
chihuahua charakter 230
Somehow my SQL-Statement is missing something:
SELECT k.keyword, count(*) AS c
FROM `classifieds` c, keywords k
WHERE c.text LIKE concat('%', + k.keyword + '%')
GROUP BY keyword
The count value is always the same for each keyword.
Does somebody have an idea where the error is? Thank you for any help.
Here is tricky sql:
SELECT text, keyword,
(LENGTH(c.text) - LENGTH(REPLACE(c.text, k.keyword, ''))) / LENGTH(k.keyword)
FROM classified c INNER JOIN keywords k
ON c.text LIKE CONCAT('%', k.keyword, '%');
Let $replace = REPLACE(c.text, k.keyword, '')
This remove keyword from text
Let $len = LENGTH(c.text) - LENGTH($replace)
This calculates "how many characters are removed from text"
$len / LENGTH(k.keyword)
Finally we get how many keywords text has.
Related
I have two tables in my SQL: database with thousands of sentences and words database with thousands of words. The thing I want do is count percentage of used vocabulary in sentences database when vocabulary is in words database. For example if words sentence database contained records:
and words database contained records:
It would return 50% of used vocabulary. Is there a simple way to do that with SQL? I have already REGEXP pattern to check if the sentence contains specific word: "\b$word\b[^']".
This is a simple way, but it is not efficient:
select count(*) / (select count(*) from words)
from words w
where exists (select 1
from sentences s
where s.sentence regexp concat('\b', w.word, '\b[^']')
);
Note: Constructing the string literals may be a little tricky depending on how you are calling the query. You may need \\ for instance instead of \.
Another way:
In my query if a word appear twice in a sentence, it will be count only once.
SELECT
(SELECT COUNT(*)
FROM sentences s
CROSS JOIN words w
WHERE LOWER(s.sentence) LIKE CONCAT('%', LOWER(w.word), '%'))
/ (SELECT COUNT(*) FROM words) * 100 AS percentage
This way it considered all appearance each word in each sentence:
SELECT
(SELECT SUM(ROUND((LENGTH(s.sentence) - LENGTH(REPLACE(s.sentence, w.word, ""))) / LENGTH(w.word)))
FROM sentences s
CROSS JOIN words w
WHERE s.sentence REGEXP CONCAT('[[:<:]]', w.word, '[[:>:]]'))
/
(SELECT count(*) FROM words) * 100 AS percentage
Editing someone else's code found this query:
SELECT c.name AS category_name,
p.id,
p.name,
p.description,
p.price,
p.category_id,
p.created
FROM products p
LEFT JOIN categories c
ON p.category_id = c.id
WHERE p.name LIKE '%keyword%' escape '!'
OR p.description LIKE '%keyword%' escape '!'
ORDER BY p.name ASC
LIMIT 0, 6
I understand everything but the escape '!' on lines 11 and 12. I guess is something related to 'escaping' and, in case of, don't know if is better implementing it before the query (code soup is PHP) or let the job to the DB engine (And what means the '!' symbol?).
Thanks in advance.
The ESCAPE keyword is used to escape pattern matching characters such as the (%) percentage and underscore (_) if they form part of the data.
Let's suppose that we want to check for the string "67%" we can use;
LIKE '67#%%' ESCAPE '#';
If we want to search for the movie "67% Guilty", we can use the script shown below to do that.
SELECT * FROM movies WHERE title LIKE '67#%%' ESCAPE '#';
Note the double "%%" in the LIKE clause, the first one in red "%" is treated as part of the string to be searched for. The other one is used to match any number of characters that follow.
The same query will also work if we use something like
SELECT * FROM movies WHERE title LIKE '67=%%' ESCAPE '=';
You don't need escape in this particular query but if you ever do (i.e. if you have the % character in your search term) you will need a escape character to differentiate between the % which is part of your search term and other % characters that serve as placeholder in the like part of the query.
In that case, you escape the % in your search term with the character you defined using the escape keyword.
For instance, say you want to look for the string 25% in the name field. You'll go like this:
WHERE p.name LIKE '%25!%%' escape '!'
I'm trying to join two tables - let's call them table1 and table2 - in MySQL based on a text column in each table. table1.text is all sentences, and I need to join on table2.text where the word or phrase from table2.text appears in the sentence for table one.
The tricky part is if the phrase from table2.text is surrounded by **, then it needs to be an exact match for that word. If not and it's just a normal phrase, it can be a regex match - so a word like can in table2.text would match the sentence I have cans in table1.text. However, **can** in table2.text would not I have cans in table1.text.
So far I've been thinking this:
select a.text, replace(b.text,'**',' ')
from table1 a join
table2 b
on a.text like CONCAT('%', b.text, '%');
But that doesn't account for ** words that appear at the beginning of a sentence or before punctuation. Any ideas?
This query will do what you want. It checks the value of table2.text to see if it matches the **word** format, and if not, just uses a LIKE compare to see if the word is in table1.text. If table2.text matches the **word** format, it uses a REGEXP test to ensure that table2.text only occurs in table1.text as a whole word (using the [[:<:]] and [[:>:]] word delimiters). I've put some sample data to demonstrate in this SQLFiddle.
SELECT a.text, REPLACE(b.text, '**', '')
FROM table1 a
JOIN table2 b
ON b.text NOT REGEXP('\\*\\*[a-z]+\\*\\*') AND a.text LIKE CONCAT('%', b.text, '%') OR
a.text REGEXP CONCAT('[[:<:]]', REPLACE(b.text, '**', ''), '[[:>:]]')
If I figure out your issue, it will be something like this, maybe.
SELECT a.text, replace(b.text,'**',' ')
FROM table1 a, table2 b
WHERE REGEXP_LIKE (a.text, b.text);
In the table tags I have the field id. In the table tastings I have a field tags which is a list of id numbers separated by commas, for example 2,4,5. The database is MySQL.
Now I am trying to count how many times each tag is used in total. But I am stuck with the LIKE part. I have tried the following, all giving a syntax error:
SELECT tags.id, tag, FROM tags, tastings WHERE tags LIKE tags.id + '%'
SELECT tags.id, tag, FROM tags, tastings WHERE tags LIKE tags.id & '%'
SELECT tags.id, tag, FROM tags, tastings WHERE tags LIKE CONCAT(tags.id, '%')
What am I doing wrong?
I helps if you post the error, but you have an extra comma after the select list. Also, you might need to qualify tags, since it is both a table and a column.
Try this:
SELECT tags.id, tag
FROM tags, tastings
WHERE tastings.tags LIKE CONCAT('%', tags.id, '%')
or better, use the new join syntax:
SELECT tags.id, tag
FROM tags
JOIN tastings on tastings.tags LIKE CONCAT('%', tags.id, '%')
Note the sandwiching of tag.id in % so you find the id anywhere in it.
Warning: This join will hit id 4 when tags are 13,14,15 (there's a 4 in 14), so unless your ids are all less than 10, you'll need to rethink your join criteria.
You haven't specified your DBMS so I'm taking a guess at what is supported.
How about
SELECT tags.id, tastings.tag
FROM tags
INNER JOIN tastings
ON tastings.tags like '%' + tags.id + '%'
This should work (again, depending on your DBMS) but you should really normalize your data, this type of thing really isn't going to scale/perform well.
Try the below query.
select t.id , count(t.id) from tag t , tastings tas where tas.tags like '%' ||t.id||'%'
group by t.id;
I have this scenario:
I want to check for particular words, and if they match a term, I will have to update the content of that page and link it to the term. But for now I am focusing on getting the content pages which have a part of the content the same as a particular term.
This is an idea of what I need to do, but it is not working since the subquery returns more than one field.
I want to find WHERE m.module_content is LIKE any of the terms I have, but it should check with them all.
SELECT m.module_termid, t.term_name, m.module_name, m.module_content
FROM modules m
JOIN terms t ON m.module_termid = t.term_id
WHERE m.module_content LIKE '%' || (SELECT term_name FROM terms) || '%'
module_content has text in html format, so eventually all I would need to do is, if it matches a term and it is not yet links, I will add a link to that particular term.
What is the best option to do here? (I am using mysql btw)
To give you an example of what the expected result is:
Terms: id: 1, name: hello Modules: id: 1, content: < p > Hello World < /p >
I would like that modules with id 1 is brought up, since it contains content which somewhere has the term name "hello"
Updated:
Tried Pablo's solution but this is what happens:
"Ray Davis" has nothing to do with the term "Float" for example, so that should not have appeared.
I think you just need to change your JOIN condition to something like:
SELECT m.module_termid, t.term_name, m.module_name, m.module_content
FROM modules m
JOIN terms t ON (m.module_content LIKE '%' || t.term_name || '%')
Having said that, this could be potentially very inefficient. Consider using a FULL TEXT INDEX INSTEAD for this operation.
After a bit of research, my solution would look like this:
SELECT m.module_termid, t.term_name, m.module_name, m.module_content
FROM modules m
INNER JOIN terms t ON m.module_termid = t.term_id
WHERE m.module_content LIKE CONCAT('%', TRIM(t.term_name), '%')
edit: Regarding Paul Morgans comment, I replaced CONCAT('%', t.term_name, '%') with CONCAT('%', TRIM(t.term_name), '%') so that all the whitespaces in t.term_name are stripped off. If you need the whitespaces in t.term_name, just remove the TRIM call and use the old version (CONCAT('%', t.term_name, '%'))
MySQL does not have any concatenation operator, and the query should actually be written as:
SELECT m.module_termid, t.term_name, m.module_name, m.module_content
FROM modules m
JOIN terms t ON m.module_content LIKE CONCAT('%', t.term_name, '%');
But what happened:
m.module_content LIKE '%' || t.term_name || '%'
is actually equivalent to
(m.module_content LIKE '%') || (t.term_name) || ('%')
which is always 1. Thus, you have a Cartesian Product =)
UPD: more as a reference to myself, MySQL does have a concatenation operator ||, but to use it one should set PIPES_AS_CONCAT mode:
mysql> SET sql_mode= 'pipes_as_concat';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'qwe' || 'asd';
+----------------+
| 'qwe' || 'asd' |
+----------------+
| qweasd |
+----------------+
1 row in set (0.00 sec)
You may try this instead:
SELECT m.module_termid, t.term_name, m.module_name, m.module_content
FROM modules m
JOIN terms t ON (m.module_content LIKE '%' + t.term_name + '%')
Instead of "LIKE", using "IN" should be the solution: something like:-
SELECT m.module_termid, t.term_name, m.module_name, m.module_content
FROM modules m JOIN terms t ON m.module_termid = t.term_id
WHERE m.module_content IN (SELECT term_name FROM terms);
Try the below query -
SELECT
tp.module_termid,
tp.term_name,
tp.module_name,
tp.module_content
FROM (
SELECT
m.module_termid,
t.term_name,
m.module_name,
m.module_content,
IF(LOCATE(t.term_name,m.module_content)!=0, m.module_content, ' ')
as required_content
FROM modules m
LEFT JOIN terms t ON m.module_termid = t.term_id
) tp
WHERE tp.required_content != '';
For the above query you will get all rows where term_name columns data is present as a whole word in modules table's module_content column. If you dont want to match only on whole word then in that case u can use MYSQL'S regular expression function in place of LOCATE function.
The documentation for LOCATE function can be found out here
I don't think it is a good way to resolve the problem like this.supports that you have a lot of module items,and the popular word is limit.each time you exec the sql,it needs lots of disk io and may block the online mysql db.
my way is like this:
invert index the module content.
search the popular words with the index.
bind the module id to the key word.
as you can see.it is very efficient and fast.so,the problem is how to make inverted index on the module content.sphinx will do a good job.
hope this will help you:)