MySQL, Finding every string/words frequency in a column - mysql

I want to find every word frequency in a column by using MySQL only (if possible). For example:
Table:
id message
1 I want to eat pizza
2 I wanted chocolates
3 He doesn't like me
Query: ???
Result:
Word Frequency
I 2
want 1
to 1
eat 1
pizza 1
wanted 1
etc..
Is it possible? If so please help, thank you

You need to split the data. This is a pain:
select substring_index(substring_index(message, ' ', n.n), ' ', -1) as word,
count(*)
from (select 1 as n union all select 2 union all select 3 union all
select 4 union all select 5
) n join
t
on n.n <= 1 + length(message) - length(replace(message, ' ', ''))
group by word;
The above assumes that all messages are five words or less. You can extend the number in the first subquery for longer messages.

Here is a php example. You will probably have to tweak it a bit.
lets assume you have a word_frequency table with a unique column word and an integer for count. Also, this is susceptible to SQL injection, so you should be careful. But this should get you started.
<?php
$con=mysqli_connect("localhost","my_user","my_password","my_db");
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
$results = mysqli_query($con,"SELECT message FROM table1");
while($row = $results->fetch_assoc()) {
$words = explode(" ", $row['message']);
foreach ($words as $word) {
mysqli_query($con,"INSERT INTO word_frequency (`word`,`count`) VALUES ('$word',1) ON DUPLICATE KEY UPDATE `count`=`count`+1;");
}
}
mysqli_close($con);

Related

How to extract a specific word after # and before it ends? MySQL

So I have a table posts and I need to extract specific tags from the column description.
I need to extract f.e. #sunny and #lovelyday and print it out separately from the full text description. For example:
description: "today was a good day! #sunny #lovelyday" -> extract #sunny & #lovelyday
I searched for answers, but perhaps I do not understand it. (still learning)
PhpMyAdmin doesnt support, charindex() or len() -> so I tried locate() and length() but my query isn't correct.
SELECT SUBSTRING(description, locate("#",description) + length("#")) AS Tags
from posts
What am I doing wrong?
Edit: I need the "#%" from textfield and extract it
SET #string := 'today was a good day! #sunny #lovelyday';
WITH RECURSIVE
cte AS ( SELECT 0 n
UNION ALL
SELECT n + 1
FROM cte
WHERE n < (SELECT LENGTH(#string) - LENGTH(REPLACE(#string, '#', ''))))
SELECT n, SUBSTRING_INDEX(SUBSTRING_INDEX(#string, '#', -n), ' ', 1)
FROM cte
WHERE n;
n | SUBSTRING_INDEX(SUBSTRING_INDEX(#string, '#', -n), ' ', 1)
-: | :---------------------------------------------------------
1 | lovelyday
2 | sunny
db<>fiddle here

MYSQL Letters before numbers descending

I am trying to get letters before numbers with a descending order and am finding it very difficult to do so.
I have tried using Case ordering
ORDER BY CASE WHEN ' . $this->sortbycast . ' LIKE \'%[a-z]%\' THEN 0 WHEN ' . $this->sortbycast . ' LIKE \'%[0-9]%\' THEN 1 ELSE ' . $this->sortbycast . ' END DESC
Regular expression as follows
' ORDER BY ' . $this->sortbycast . ' REGEXP \'^[a-z]\', ' . $this->sortbycast . ' desc';
and CAST()
' ORDER BY CAST(' . $this->sortbycast . ' AS UNSIGNED), ' . $this->sortbycast;
But none of these have had the desired result. All work correctly with ascending order. Any help would be appreciated. Thanks.
Update due to Strawberry's response
DB Fiddle here
My present query looks like this - reformatted for legibility:
SELECT p.*
, m.data
, m.id author_id
FROM anchor_posts p
JOIN anchor_post_meta m
ON m.post = p.id
WHERE p.category = 5
AND m.extend = 3
ORDER
BY CAST('anchor_post_meta.data' AS UNSIGNED) -- NOTE: THIS IS A STRING !?!
, m.data DESC
LIMIT 12;
The desired result
The desired result would be to have items show in the following order:
data
==============
A Title
B Title
C Title
D Title
70000
60000
45000
30000
25000
12000
Please see the following version of the DB Fiddle to see more clearly.

How can I extract a last name from full name in mysql?

I have a table with fullname column. I want to make a query for finding a person via his last name but his last name is in the full name column.
Would it matter if it accidentally returned someone whose first name matched your query?
A simple query would be:
SELECT *
FROM TABLE
WHERE fullname LIKE '%insertlastname%'
If you want to define the last name as the name after the last space:
SELECT substring_index(fullname, ' ', -1) as lastname
FROM TABLE
WHERE lastname='insertlastname'
Two suboptimal answers, but some answers at least.
enter code here You can use this if you want to fetch by query:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX( `fullname` , ' ', 2 ),' ',1) AS b,
SUBSTRING_INDEX(SUBSTRING_INDEX( `fullname` , ' ', -1 ),' ',2) AS c FROM `users` WHERE `userid`='1'
But you can also try by PHP to fetch last name. You just use explode function to fetch last name.
Exm:
$full_name = "row moin";
$pieces = explode(" ", $fullname);
echo $first_name = $pieces[0]; // row
echo $last_name = $pieces[1]; // moin
A simple answer for this is like this suppose we have a name
Charles Dickens
:
SELECT * FROM TABLE_NAME WHERE SUBSTRING_INDEX(FULLNAME,' ',-1) like '%Dickens';

SQL: match a string pattern irrespective of it's case, whitespaces in a column

I need to find the frequency of a string in a column, irrespective of its case and any white spaces.
For example, if my string is My Tec Bits and they occur in my table like this, as shown below :
061 MYTECBITS 12123
102 mytecbits 24324
103 MY TEC BITS 23432
247 my tec bits 23243
355 My Tec Bits 23424
454 My Tec BitS 23432
Then, the output should be 6, because, with whites pace removed and irrespective of case, all those strings are identical.
Is there any grep() equivalent in SQL as there is in R?
If your concern is only with the SPACE and the CASE, then you need two functions:
REPLACE
UPPER/LOWER
For example,
SQL> WITH DATA AS(
2 SELECT 'MYTECBITS' STR FROM DUAL UNION ALL
3 SELECT 'mytecbits' STR FROM DUAL UNION ALL
4 SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
5 SELECT 'my tec bits' STR FROM DUAL UNION ALL
6 SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
7 SELECT 'MY TEC BITS' STR FROM DUAL
8 )
9 SELECT UPPER(REPLACE(STR, ' ', '')) FROM DATA
10 /
UPPER(REPLA
-----------
MYTECBITS
MYTECBITS
MYTECBITS
MYTECBITS
MYTECBITS
MYTECBITS
6 rows selected.
SQL>
Then, the output should be 6
So, based on that, you need to use it in the filter predicate and COUNT(*) the rows returned:
SQL> WITH DATA AS(
2 SELECT 'MYTECBITS' STR FROM DUAL UNION ALL
3 SELECT 'mytecbits' STR FROM DUAL UNION ALL
4 SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
5 SELECT 'my tec bits' STR FROM DUAL UNION ALL
6 SELECT 'MY TEC BITS' STR FROM DUAL UNION ALL
7 SELECT 'MY TEC BITS' STR FROM DUAL
8 )
9 SELECT COUNT(*) FROM DATA
10 WHERE UPPER(REPLACE(STR, ' ', '')) = 'MYTECBITS'
11 /
COUNT(*)
----------
6
SQL>
NOTE The WITH clause is only to build the sample table for demonstration purpose. In our actual query, remove the entire WITH part, and use your actual table_name in the FROM clause.
So, you just need to do:
SELECT COUNT(*) FROM YOUR_TABLE
WHERE UPPER(REPLACE(STR, ' ', '')) = 'MYTECBITS'
/
You could use something like
UPPER(REPLACE(userString, ' ', ''))
to check for upper case only and to remove white space.
You could cast your statements to LOWER() before comparing them eg.
LOWER(column_name) = LOWER(variable)
more specific:
LOWER(First_name) = LOWER('JoHn DoE')
would become first name = 'john doe'
For the spacing you should use replace, the format for that is:
REPLACE(yourstring, ' ' , '')
' ' = a space character replace it by an empty string = ''
So you would do
WHERE LOWER(REPLACE(fieldname, ' ', '') = 'mytecbits'
You need to use count to bring back the number affected, Lower will place the data into lower case so that when you make a comparison you can make it lower case.
To remove spaces you then use Replace and replace the space with an empty string for your comparison:
Select COUNT(ColumnA)
from table
where Lower(Replace(ColumnB, ' ', '')) = 'mytecbits'
If you are looking for the number of instances of one specific string, irrespective of case / whitespace, then you need to do the following -
ignore whitespace
ignore case
count the number of instances of the string
So you want a query like the following -
SELECT
COUNT(field)
FROM
table
WHERE
UPPERCASE(REPLACE(field, ' ', '')) = UPPERCASE(REPLACE(userstring, ' ', ''))
This counts the number of rows in your table where field is the same as the userstring, when case is ignored (all set to the same case using UPPERCASE, so it is effecitvely ignored), and spaces are ignored (spaces are removed from the field and the userstring using REPLACE)
Since REGEXP is case insensitive, you can obtain a match by making the spaces optional, example:
SELECT count(field) FROM yourtable WHERE field REGEXP "MY *TEC *BITS";
Note: if needed, you can add a space or a [[:<:]] (word boundary) before "MY" and a space or a [[:>:]] after "BITS" to avoid false positive.

count characters matching in CONCAT MySQL REGEXP

I have the following MySQL query which works
SELECT *,
CONCAT( office, ' ', contactperson ) AS bigDataField
FROM webcms_mod_references
HAVING bigDataField REGEXP "one|two"
Now there is no ORDER BY and if:
- bigDataField contains "one" this field is shown
- bigDataField contains "one two" this field is shown aswell
now it depends on the id which one of those is shown first, but I want the one with the more matches to be shown first!
I tried with
SUM(
CASE WHEN bigDataField REGEXP "one|two"
THEN 1
ELSE 0 END
) AS matches
But that does not work. Can anyone help me. I think the best would be as the title says to count the matching charachters from the REGEXP. If there are other ways please explain.
The REGEXP is a user input, so, I'm trying to implement a small search over a small Database.
This is theoretical whilst sqlfiddle is down but you might have to split the REGEXP into two so you can count the matches. REGEXP will return either a 0 or 1. Either it matched or didn't. There's no support for finding how many times it was matched in a string.
SELECT *,
CONCAT( office, ' ', contactperson ) AS bigDataField
FROM webcms_mod_references
HAVING bigDataField REGEXP "one|two"
ORDER BY (bigDataField REGEXP "one" + bigDataField REGEXP "two") DESC
There is no way to count the amount of matches on a regex. What you can do is match them separately and order by each of those matches. EG:
SELECT *,
CONCAT( office, ' ', contactperson ) AS bigDataField
FROM webcms_mod_references
HAVING bigDataField REGEXP "one|two"
ORDER BY
CASE WHEN bigDataField REGEXP "one" AND bigDataField REGEXP "two" THEN 0
ELSE 1 -- The else should catch the "two" alone or the "one" alone because of the filtering
END
Of course, you can use a LIKE here too but maybe your regex are more complex than that :)
When I want to count some substring I do replace and "-" the length, example:
SELECT (
LENGTH('longstringlongtextlongfile') -
LENGTH(REPLACE('longstringlongtextlongfile', 'long', ''))
) / LENGTH('long') AS `occurrences`
I think this is an elegant solution for a problem of counting how many times 'long' appears inside provided 'string'
This is not especially the answer to this question, but I think strongly attached to it... (And I hope, will help someone, who cames from google, etc)
So if you use PHP (if not, may dont keep reading ...), you can build the query with that, and in this case, you can do this (about #Moob great answer):
function buildSearchOrderBy(string $regex, string $columName, string $alternateOrderByColumName): string
{
$keywords = explode ('|', $regex);
if (empty ($keywords)) {
return $alternateOrderByColumName;
}
$orderBy = '(';
$i = 0;
foreach ($keywords as $keyword) {
$i++;
if ($i > 1) $orderBy .= " + ";
$orderBy .= "IF((" . $columName . " REGEXP '" . $keyword . "')>0, " . (100 + strlen($keyword)) . ", 0)";
}
$orderBy .= ')';
return $orderBy;
}
So in this case every match worth 100 + so many scores, what the numbers of the characters in the current keyword. Every match starting from 100, because this ensure the base, that the first results will be these, where the total score originate from the more matches, but in proportionally worth more a longer keyword in any case.
Builded to one column check, but I think you can update easy.
If copied to your project, just use like this (just an example):
$orderBy = buildSearchOrderBy($regex, 'article.title', 'article.created');
$statement = "SELECT *
FROM article
WHERE article.title REGEXP '(" . $regex . ")'
ORDER BY " . $orderBy . " DESC"
;