Count the number of occurrences of a string in a VARCHAR field? - mysql

I have a table like this:
TITLE | DESCRIPTION
------------------------------------------------
test1 | value blah blah value
test2 | value test
test3 | test test test
test4 | valuevaluevaluevaluevalue
I am trying to figure out how to return the number of times a string occurs in each of the DESCRIPTION's.
So, if I want to count the number of times 'value' appears, the sql statement will return this:
TITLE | DESCRIPTION | COUNT
------------------------------------------------------------
test1 | value blah blah value | 2
test2 | value test | 1
test3 | test test test | 0
test4 | valuevaluevaluevaluevalue | 5
Is there any way to do this? I do not want to use php at all, just mysql.

This should do the trick:
SELECT
title,
description,
ROUND (
(
LENGTH(description)
- LENGTH( REPLACE ( description, "value", "") )
) / LENGTH("value")
) AS count
FROM <table>

A little bit simpler and more effective variation of #yannis solution:
SELECT
title,
description,
CHAR_LENGTH(description) - CHAR_LENGTH( REPLACE ( description, 'value', '1234') )
AS `count`
FROM <table>
The difference is that I replace the "value" string with a 1-char shorter string ("1234" in this case). This way you don't need to divide and round to get an integer value.
Generalized version (works for every needle string):
SET #needle = 'value';
SELECT
description,
CHAR_LENGTH(description) - CHAR_LENGTH(REPLACE(description, #needle, SPACE(LENGTH(#needle)-1)))
AS `count`
FROM <table>

try this:
select TITLE,
(length(DESCRIPTION )-length(replace(DESCRIPTION ,'value','')))/5 as COUNT
FROM <table>
SQL Fiddle Demo

In SQL SERVER, this is the answer
Declare #t table(TITLE VARCHAR(100), DESCRIPTION VARCHAR(100))
INSERT INTO #t SELECT 'test1', 'value blah blah value'
INSERT INTO #t SELECT 'test2','value test'
INSERT INTO #t SELECT 'test3','test test test'
INSERT INTO #t SELECT 'test4','valuevaluevaluevaluevalue'
SELECT TITLE,DESCRIPTION,Count = (LEN(DESCRIPTION) - LEN(REPLACE(DESCRIPTION, 'value', '')))/LEN('value')
FROM #t
Result
TITLE DESCRIPTION Count
test1 value blah blah value 2
test2 value test 1
test3 test test test 0
test4 valuevaluevaluevaluevalue 5
I don't have MySQL install, but goggled to find the Equivalent of LEN is LENGTH while REPLACE is same.
So the equivalent query in MySql should be
SELECT TITLE,DESCRIPTION, (LENGTH(DESCRIPTION) - LENGTH(REPLACE(DESCRIPTION, 'value', '')))/LENGTH('value') AS Count
FROM <yourTable>
Please let me know if it worked for you in MySql also.

Here is a function that will do that.
CREATE FUNCTION count_str(haystack TEXT, needle VARCHAR(32))
RETURNS INTEGER DETERMINISTIC
BEGIN
RETURN ROUND((CHAR_LENGTH(haystack) - CHAR_LENGTH(REPLACE(haystack, needle, ""))) / CHAR_LENGTH(needle));
END;

This is the mysql function using the space technique (tested with mysql 5.0 + 5.5):
CREATE FUNCTION count_str( haystack TEXT, needle VARCHAR(32))
RETURNS INTEGER DETERMINISTIC
RETURN LENGTH(haystack) - LENGTH( REPLACE ( haystack, needle, space(char_length(needle)-1)) );

SELECT
id,
jsondata,
ROUND (
(
LENGTH(jsondata)
- LENGTH( REPLACE ( jsondata, "sonal", "") )
) / LENGTH("sonal")
)
+
ROUND (
(
LENGTH(jsondata)
- LENGTH( REPLACE ( jsondata, "khunt", "") )
) / LENGTH("khunt")
)
AS count1 FROM test ORDER BY count1 DESC LIMIT 0, 2
Thanks Yannis, your solution worked for me and here I'm sharing same solution for multiple keywords with order and limit.

In most cases, these functions are LENGTH and REPLACE, respectively (SQL Server users will use the built-in function LEN rather than LENGTH):
Example, count num of comma in the string "10,CLARK,MANAGER"
select (length('10,CLARK,MANAGER')-
length(replace('10,CLARK,MANAGER',',','')))/length(',')
as cnt from t1

Related

Teradata SQL Split Single String into Table Rows

I have one string element, for example : "(1111, Tem1), (0000, Tem2)" and hope to generate a data table such as
var1
var2
1111
Tem1
0000
Tem2
This is my code, I created the lag token and filter with odd rows element.
with var_ as (
select '(1111, Tem1), (0000, Tem2)' as pattern_
)
select tbb1.*, tbb2.result_string as result_string_previous
from(
select tb1.*,
min(token) over(partition by 1 order by token asc rows between 1 preceding and 1 preceding) as min_token
from
table (
strtok_split_to_table(1, var_.pattern_, '(), ')
returns (outkey INTEGER, token INTEGER, result_string varchar(20))
) as tb1) tbb1
inner join (select min_token, result_string from tbb1) tbb2
on tbb1.token = tbb2.min_token
where (token mod 2) = 0;
But it seems that i can't generate new variables in "from" step and applied it directly in "join" step.
so I wanna ask is still possible to get the result what i want in my procedure? or is there any suggestion?
Thanks for all your assistance.
I wouldn't split / recombine the groups. Split each group to a row, then split the values within the row, e.g.
with var_ as (
select '(1111, Tem1), (0000, Tem2)' as pattern_
),
split1 as (
select trim(leading '(' from result_string) as string_
from
table ( /* split at & remove right parenthesis */
regexp_split_to_table(1, var_.pattern_, '\)((, )|$)','c')
returns (outkey INTEGER, token_nbr INTEGER, result_string varchar(256))
) as tb1
)
select *
from table(
csvld(split1.string_, ',', '"')
returns (var1 VARCHAR(16), var2 VARCHAR(16))
) as tb2
;

Extracting more than one date from a text field

I have a SQL query that I am using that works perfectly for what I am needing at extracting the first date in a text field. This is a free form text field that is tied to a status that when the right conditions align my query looks for a date and if it is in the correct format it extracts this date.
Sometimes the date will be input as a range of dates or a comma separated list of dates. I would like to know if there is a way to extract the last date in the case of a date range or the other dates in a list of dates?
The current query has 3 steps in temp tables for extracting the date here are snippets for each step.
In the first step it looks for the word 'proposed' and grabs a number of characters after:
,SUBSTRING(al.Comments,
PATINDEX('%proposed%',al.Comments)+9,17) [DateFirstPass]
In the second step the query it extracts the date:
,LEFT(
SUBSTRING(p1.DateFirstPass, PATINDEX('%[0-9/]%', p1.DateFirstPass), 10), --string (only numbers and forward slash) MAX of 10 chars (mm/dd/yyyy)
PATINDEX('%[^0-9/]%', SUBSTRING(p1.DateFirstPass, PATINDEX('%[0-9/]%', p1.DateFirstPass), 10) + 'X')-1) --char length after negating nonvalid characters '%[^0-9/]%'
[DateSecondPass]
In the last pass it adds in the year if it is missing:
CASE
WHEN ISDATE(p2.DateSecondPass) = 1
THEN CAST(p2.DateSecondPass AS DATE)
WHEN ISDATE(p2.DateSecondPass + '/' + this.yr) = 1
THEN p2.DateSecondPass + '/' + this.yr --Adds missing year
END [DateThirdPass]
FROM
#ProposedDateParse2 p2
CROSS JOIN (VALUES (CAST(YEAR(GETDATE()) AS varchar(4)))) this(yr)
Grab a copy of patternSplitCM
Here's the code:
-- PatternSplitCM will split a string based on a pattern of the form
-- supported by LIKE and PATINDEX
--
-- Created by: Chris Morris 12-Oct-2012
CREATE FUNCTION dbo.PatternSplitCM
(
#List VARCHAR(8000) = NULL,
#Pattern VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
WITH numbers AS (
SELECT TOP(ISNULL(DATALENGTH(#List), 0))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
(VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n))
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
Item = SUBSTRING(#List,MIN(n),1+MAX(n)-MIN(n)),
[Matched]
FROM (
SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)
FROM numbers
CROSS APPLY (
SELECT [Matched] = CASE WHEN SUBSTRING(#List,n,1) LIKE #Pattern THEN 1 ELSE 0 END
) y
) d
GROUP BY [Matched], Grouper
Solution against a variable:
DECLARE #string varchar(8000) =
'Blah blah 3/1/2017, 12/19/2018,1/2/2020,1111/11111/1111/111/111 blah blah';
SELECT TOP (1) Item
FROM dbo.patternSplitCM(#string, '[0-9/]')
WHERE [Matched] = 1 AND ISDATE(item) = 1
ORDER BY -ItemNumber;
Results:
item
------
1/2/2020
Example against a table:
DECLARE #table table (someid int identity, sometext varchar(1000));
INSERT #table(sometext) VALUES
('Blah blah 3/1/2017, 12/19/2018,1/2/2020,1111/11111/1111/111/111 blah blah'),
('Yada yada 1/1/12, 12/31/1999 call me at 555-1212!');
SELECT t.someid, getLastDate.Item
FROM #table t
CROSS APPLY
(
SELECT TOP (1) Item
FROM dbo.patternSplitCM(t.sometext, '[0-9/]')
WHERE [Matched] = 1 AND ISDATE(item) = 1
ORDER BY -ItemNumber
) getLastDate;
Results:
someid Item
----------- -----------
1 1/2/2020
2 12/31/1999

SQL: select unique substrings from the table by mask

There is a SQL table mytable that has a column mycolumn.
That column has text inside each cell. Each cell may contain "this.text/31/" or "this.text/72/" substrings (numbers in that substrings can be any) as a part of string.
What SQL query should be executed to display a list of unique such substrings?
P.S. Of course, some cells may contain several such substrings.
And here are the answers for questions from the comments:
The query supposed to work on SQL Server.
The prefered output should contain the whole substring, not the numeric part only. It actually could be not just the number between first "/" and the second "/".
And it is varchar type (probably)
Example:
mycolumn contains such values:
abcd/eftthis.text/31/sadflh adslkjh
abcd/eftthis.text/44/khjgb ljgnkhj this.text/447/lhkjgnkjh
ljgkhjgadsvlkgnl
uygouyg/this.text/31/luinluinlugnthis.text/31/ouygnouyg
khjgbkjyghbk
The query should display:
this.text/31/
this.text/44/
this.text/447/
How about using a recursive CTE:
CREATE TABLE #myTable
(
myColumn VARCHAR(100)
)
INSERT INTO #myTable
VALUES
('abcd/eftthis.text/31/sadflh adslkjh'),
('abcd/eftthis.text/44/khjgb ljgnkhj this.text/447/lhkjgnkjh'),
('ljgkhjgadsvlkgnl'),
('uygouyg/this.text/31/luinluinlugnthis.text/31/ouygnouyg'),
('khjgbkjyghbk')
;WITH CTE
AS
(
SELECT MyColumn,
CHARINDEX('this.text/', myColumn, 0) AS startPos,
CHARINDEX('/', myColumn, CHARINDEX('this.text/', myColumn, 1) + 10) AS endPos
FROM #myTable
WHERE myColumn LIKE '%this.text/%'
UNION ALL
SELECT T1.MyColumn,
CHARINDEX('this.text/', T1.myColumn, C.endPos) AS startPos,
CHARINDEX('/', T1.myColumn, CHARINDEX('this.text/', T1.myColumn, c.endPos) + 10) AS endPos
FROM #myTable T1
INNER JOIN CTE C
ON C.myColumn = T1.myColumn
WHERE SUBSTRING(T1.MyColumn, C.EndPos, 100) LIKE '%this.text/%'
)
SELECT DISTINCT SUBSTRING(myColumn, startPos, EndPos - startPos)
FROM CTE
Having a table named test with the following data:
COLUMN1
aathis.text/31/
this.text/1/
bbbthis.text/72/sksk
could this be what you are looking for?
select SUBSTR(COLUMN1,INSTR(COLUMN1,'this.text', 1 ),INSTR(COLUMN1,'/',INSTR(COLUMN1,'this.text', 1 )+10) - INSTR(COLUMN1,'this.text', 1 )+1) from test;
result:
this.text/31/
this.text/1/
this.text/72/
i see your problem:
Assume the same table as above but now with the following data:
this.text/77/
xxthis.text/33/xx
xthis.text/11/xxthis.text/22/x
xthis.text/1/x
The following might help you:
SELECT SUBSTR(COLUMN1, INSTR(COLUMN1,'this.text', 1 ,1), INSTR(COLUMN1,'/',INSTR(COLUMN1,'this.text', 1 ,1)+10) - INSTR(COLUMN1,'this.text', 1 ,1)+1) FROM TEST
UNION
SELECT CASE WHEN (INSTR(COLUMN1,'this.text', 1,2 ) >0) THEN
SUBSTR(COLUMN1, INSTR(COLUMN1,'this.text', 1,2 ), INSTR(COLUMN1,'/',INSTR(COLUMN1,'this.text', 1 ,2),2) - INSTR(COLUMN1,'this.text', 1,2 )+1) end FROM TEST;
it will generate the following result:
this.text/1/
this.text/11/
this.text/22/
this.text/33/
this.text/77/
The downside is that you need to add a select statement for every occurance you might have of "this.text". If you might have 100 "this.text" in the same cell it might be a problem.
SQL> select SUBSTR(column_name,1,9) from tablename;
column_name
this.text
SELECT REGEXP_SUBSTR(column_name,'this.text/[[:digit:]]+/')
FROM table_name

MySQL Query: limit result respecting substring count

I have a table like this
ROW ID | CONTENT
------------------------------------------------
test1 | foo, foo, foo
test2 | bar, bar
test3 | foo, foo
test4 | foo, foo, foo, foo
What I want to achieve is query that gives me the rows but limiting it respecting the occurrences of a substring.
Some examples could be:
Limit result to 3 "foo" occurrences -> should return test1
Limit result to 4 "foo" occurrences -> should return test1 and test3
Limit result to 100 "foo" occurrences -> should return test1,test3, test4
Limit result to 7 "foo" occurrences -> should also return test1,test3, test4
Is there any way to do this? Thanks in advance!
P.S. : I should have mentioned that the ',' can be any string without a predictable length.
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE Table1
(`ROW ID` varchar(5), `CONTENT` varchar(18))
;
INSERT INTO Table1
(`ROW ID`, `CONTENT`)
VALUES
('test1', 'foo, foo, foo'),
('test2', 'bar, bar'),
('test3', 'foo, foo'),
('test4', 'foo, foo, foo, foo')
;
Query 1:
SELECT *
FROM Table1
WHERE ((LENGTH(CONTENT) -
LENGTH(REPLACE(CONTENT, ',', ''))) + 1) < 3
AND SUBSTRING(CONTENT,1,LENGTH('FOO')) = 'FOO'
Results:
| ROW ID | CONTENT |
|--------|----------|
| test3 | foo, foo |
EDIT :
If you are dealing with phrases, it could look like this :
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE Table1
(`ROW ID` varchar(5), `CONTENT` varchar(48))
;
INSERT INTO Table1
(`ROW ID`, `CONTENT`)
VALUES
('test1', 'foo de foo refe foo'),
('test2', 'bar re bar'),
('test3', 'foo rer ef foo'),
('test4', 'foo rer foo fsdfs foo dfsfe foo')
;
Query 1:
SELECT *
FROM Table1
WHERE (LENGTH(CONCAT(' ',CONTENT,' ')) -
LENGTH(REPLACE(CONCAT(' ',UPPER(CONTENT),' '),
CONCAT(' ','FOO',' '), '')))
/(LENGTH('FOO')+2) < 3 AND
CONCAT(' ',CONTENT,' ') LIKE CONCAT('% ','FOO',' %')
Results:
| ROW ID | CONTENT |
|--------|----------------|
| test3 | foo rer ef foo |
You want to count the number of foos in the list. This is pretty easy:
select t.*
from t
where (char_length(concat(', ', content, ', ')) -
char_length(replace(concat(', ', content, ', '), ', foo, ', '1234567'))
) = 3;
The idea is to replace 'foo' with something that has one fewer character. However, you might want to be careful with 'foobars' and 'barfood' and other strings that could cause a false positive. So, this version just puts the separators at the beginning and end of the string.
Once you have this information, you can do whatever comparisons you would like.
MySQL unfortunately doesn't have any bulit-in function for what you want to do. You need something like SUBSTRING_COUNT, which doesn't exist. What you can do is, based on this answer` calculate that value.
Something like this might work:
SELECT rowid,
(LENGTH(content) - LENGTH(REPLACE(content, 'foo', ''))) / LENGTH('foo') AS cnt
FROM thetable
HAVING cnt > 0 && cnt < 4;
DEMO: http://sqlfiddle.com/#!2/10599/7

Detect if value is number in MySQL

Is there a way to detect if a value is a number in a MySQL query? Such as
SELECT *
FROM myTable
WHERE isANumber(col1) = true
You can use Regular Expression too... it would be like:
SELECT * FROM myTable WHERE col1 REGEXP '^[0-9]+$';
Reference:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
This should work in most cases.
SELECT * FROM myTable WHERE concat('',col1 * 1) = col1
It doesn't work for non-standard numbers like
1e4
1.2e5
123. (trailing decimal)
If your data is 'test', 'test0', 'test1111', '111test', '111'
To select all records where the data is a simple int:
SELECT *
FROM myTable
WHERE col1 REGEXP '^[0-9]+$';
Result: '111'
(In regex, ^ means begin, and $ means end)
To select all records where an integer or decimal number exists:
SELECT *
FROM myTable
WHERE col1 REGEXP '^[0-9]+\\.?[0-9]*$'; - for 123.12
Result: '111' (same as last example)
Finally, to select all records where number exists, use this:
SELECT *
FROM myTable
WHERE col1 REGEXP '[0-9]+';
Result: 'test0' and 'test1111' and '111test' and '111'
SELECT * FROM myTable
WHERE col1 REGEXP '^[+-]?[0-9]*([0-9]\\.|[0-9]|\\.[0-9])[0-9]*(e[+-]?[0-9]+)?$'
Will also match signed decimals (like -1.2, +0.2, 6., 2e9, 1.2e-10).
Test:
drop table if exists myTable;
create table myTable (col1 varchar(50));
insert into myTable (col1)
values ('00.00'),('+1'),('.123'),('-.23e4'),('12.e-5'),('3.5e+6'),('a'),('e6'),('+e0');
select
col1,
col1 + 0 as casted,
col1 REGEXP '^[+-]?[0-9]*([0-9]\\.|[0-9]|\\.[0-9])[0-9]*(e[+-]?[0-9]+)?$' as isNumeric
from myTable;
Result:
col1 | casted | isNumeric
-------|---------|----------
00.00 | 0 | 1
+1 | 1 | 1
.123 | 0.123 | 1
-.23e4 | -2300 | 1
12.e-5 | 0.00012 | 1
3.5e+6 | 3500000 | 1
a | 0 | 0
e6 | 0 | 0
+e0 | 0 | 0
Demo
Returns numeric rows
I found the solution with following query and works for me:
SELECT * FROM myTable WHERE col1 > 0;
This query return rows having only greater than zero number column that col1
Returns non numeric rows
if you want to check column not numeric try this one with the trick (!col1 > 0):
SELECT * FROM myTable WHERE !col1 > 0;
This answer is similar to Dmitry, but it will allow for decimals as well as positive and negative numbers.
select * from table where col1 REGEXP '^[[:digit:]]+$'
use a UDF (user defined function).
CREATE FUNCTION isnumber(inputValue VARCHAR(50))
RETURNS INT
BEGIN
IF (inputValue REGEXP ('^[0-9]+$'))
THEN
RETURN 1;
ELSE
RETURN 0;
END IF;
END;
Then when you query
select isnumber('383XXXX')
--returns 0
select isnumber('38333434')
--returns 1
select isnumber(mycol) mycol1, col2, colx from tablex;
-- will return 1s and 0s for column mycol1
--you can enhance the function to take decimals, scientific notation , etc...
The advantage of using a UDF is that you can use it on the left or right side of your "where clause" comparison. this greatly simplifies your SQL before being sent to the database:
SELECT * from tablex where isnumber(columnX) = isnumber('UnkownUserInput');
hope this helps.
Another alternative that seems faster than REGEXP on my computer is
SELECT * FROM myTable WHERE col1*0 != col1;
This will select all rows where col1 starts with a numeric value.
Still missing this simple version:
SELECT * FROM myTable WHERE `col1` + 0 = `col1`
(addition should be faster as multiplication)
Or slowest version for further playing:
SELECT *,
CASE WHEN `col1` + 0 = `col1` THEN 1 ELSE 0 END AS `IS_NUMERIC`
FROM `myTable`
HAVING `IS_NUMERIC` = 1
You can use regular expression for the mor detail https://dev.mysql.com/doc/refman/8.0/en/regexp.html
I used this ^([,|.]?[0-9])+$. This is allows handle to the decimal and float number
SELECT
*
FROM
mytable
WHERE
myTextField REGEXP "^([,|.]?[0-9])+$"
I recommend: if your search is simple , you can use `
column*1 = column
` operator interesting :) is work and faster than on fields varchar/char
SELECT * FROM myTable WHERE column*1 = column;
ABC*1 => 0 (NOT EQU **ABC**)
AB15*A => 15 (NOT EQU **AB15**)
15AB => 15 (NOT EQU **15AB**)
15 => 15 (EQUALS TRUE **15**)
SELECT * FROM myTable WHERE sign (col1)!=0
ofcourse sign(0) is zero, but then you could restrict you query to...
SELECT * FROM myTable WHERE sign (col1)!=0 or col1=0
UPDATE: This is not 100% reliable, because "1abc" would return sign of
1, but "ab1c" would return zero... so this could only work for text that does not begins with numbers.
you can do using CAST
SELECT * from tbl where col1 = concat(cast(col1 as decimal), "")
I have found that this works quite well
if(col1/col1= 1,'number',col1) AS myInfo
Try Dividing /1
select if(value/1>0 or value=0,'its a number', 'its not a number') from table