Mysql: Finding longest character sequence in a string - mysql

In mySQL, how can I find the length of the longest sequence of a given character? For example, in the following string
1325******2h3n***3k2n*
If I were looking for the * character, the result should be 6 because the chain of 6 * characters is the longest present in the string.

You can use instr and and generated table with UNION to get it.
-- This query can find up to 10. If more need, need to update the `UNION`.
select max((instr('1325*****2h3n***3k2n*',repeat('*', times)) != 0) * times ) longest_seq
from (select 1 times union select 2 union select 3 union select 4 union select 5
union select 6 union select 7 union select 8 union select 9 union select 10) t;
Demo:
mysql> select max((instr('1325*****2h3n***3k2n*',repeat('*', times)) != 0) * times ) longest_seq
-> from (select 1 times union select 2 union select 3 union select 4 union select 5
-> union select 6 union select 7 union select 8 union select 9 union select 10) t;
+-------------+
| longest_seq |
+-------------+
| 5 |
+-------------+
1 row in set (0.01 sec)

what your looking for is basically the length of the longest substring,
you can find the algorithm for it here
Trying to achieve this with a query would not be such a good idea,
I suggest, using a stored procedure instead.

Dylan Su's solution is clever and works well if you know the maximum number of characters is small or don't want the overhead of building a function.
On the other hand one of the following function definitions will work regardless of character length without having to add new UNION statements indefinitely.
This function loops over each of the characters in the string, and if they match the repeat character, increments a length counter. It then returns the max length.
DELIMITER //
CREATE FUNCTION LONGEST_CHARACTER_SEQUENCE(input VARCHAR(255), repeat_character CHAR(1))
RETURNS TINYINT UNSIGNED DETERMINISTIC NO SQL
BEGIN
DECLARE max_length TINYINT UNSIGNED DEFAULT 0;
DECLARE length TINYINT UNSIGNED DEFAULT 0;
DECLARE in_sequence BOOLEAN DEFAULT 0;
DECLARE position INT DEFAULT 1;
WHILE position <= LENGTH(input) DO
IF SUBSTRING(input, position, 1) = repeat_character THEN
IF in_sequence THEN
SET length = length + 1;
ELSE
SET length = 1;
END IF;
IF length > max_length THEN
SET max_length = length;
END IF;
SET in_sequence = 1;
ELSE
SET in_sequence = 0;
END IF;
SET position = position + 1;
END WHILE;
RETURN max_length;
END//
DELIMITER ;
SELECT LONGEST_CHARACTER_SEQUENCE('1325******2h3n***3k2n*', '*');
-- Returns: 6
Inspired by Dylan Su's answer, this function increments a length counter until INSTR no longer returns true. I think it's simpler.
DELIMITER //
CREATE FUNCTION LONGEST_CHARACTER_SEQUENCE(input VARCHAR(255), repeat_character CHAR(1))
RETURNS TINYINT UNSIGNED DETERMINISTIC NO SQL
BEGIN
DECLARE length TINYINT UNSIGNED DEFAULT 0;
WHILE INSTR(input, REPEAT(repeat_character, length + 1)) DO
SET length = length + 1;
END WHILE;
RETURN length;
END//
DELIMITER ;
SELECT LONGEST_CHARACTER_SEQUENCE('1325******2h3n***3k2n*', '*');
-- Also returns: 6

Related

storing the numbers after the decimal point into a variable in MySQL

lets say i want to store the number after the point which is (76) into a variable now. How am i supposed to do that?
I will give a scenario below.
declare x (3,2);
set x = 323.76;
declare y int;
select cast(substring_index(x, '.', -1) as unsigned) into y;
Any help would be appreciated.
As Procedure or a finction and trigger, you can use your code(with a little change)
DELIMITER $$
CREATE DEFINER=`root`#`localhost` PROCEDURE `mantisse`()
BEGIN
declare x DECIMAL(8,2);
declare y int;
set x = 323.76;
select cast(substring_index(x, '.', -1) as unsigned) into y;
INSERT INTO mytable VALUE (y);
END$$
DELIMITER ;
Or if you want to use it in a query you can use user defined variables
set #x = 323.96;
select cast(substring_index(#x, '.', -1) as unsigned) into #y;
INSERT INTO mytable VALUE (#y);
you have alreadya string so use SUBSTRING to get the 9
set #x = 323.96;
select cast(SUBSTRING(substring_index(#x, '.', -1),1,1) as unsigned) into #y;
SELECT #y;
INSERT INTO mytable VALUE (#y);
That works in the Proecdure too of course
You can easily get decimals from a number using MOD function:
SET #num = 323.76;
SET #decimals = MOD(#num, 1) * 100;
SELECT #decimals; -- 76.00
Dividing by 1, you can get the remainder with MOD function, which is 0.76, and then you only need to multiply it by 100.
If I'm understanding the specification, it seems rather bizarre. I'd use the substring_index function to trim off everything before and including the dot. But I would do the math to arrive at a value v, 0 <= v < 1
Following the MySQL stored program pseudo-code given in the question, something like this:
DECLARE x DECIMAL(5,2);
DECLARE y BIGINT;
SET x := 323.76;
SET y := SUBSTRING_INDEX( ABS(x)-FLOOR(ABS(x)) ,'.',-1) + 0;
There might be a simpler way to do it, but this is an approach that satisfies my understanding of the specification.
As a demonstration of the expression that derives the value of y, consider:
SELECT _x
, SUBSTRING_INDEX( ABS(_x)-FLOOR(ABS(_x)) ,'.',-1) + 0 AS _y
FROM ( SELECT 0 AS _x
UNION ALL SELECT 0.1
UNION ALL SELECT 2.0
UNION ALL SELECT 3.3
UNION ALL SELECT -4.00
UNION ALL SELECT -5.55
UNION ALL SELECT 623.76
UNION ALL SELECT -723.76
) t
returns
_x _y
------- -----
0.00 0
0.10 10
2.00 0
3.30 30
-4.00 0
-5.55 55
623.76 76
-723.76 76

SQL Query that counts the number of characters match in two text columns

I need to count how many characters are equal in two text columns (same size, in the same table).
For example:
RowNum: Template: Answers:
------- --------- --------
1 ABCDEABCDEABCDE ABCDAABCDBABCDC
2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB
SELECT SOME_COUNT_FUNCTION (Template, Answers) should return:
RowNum: Result:
------- -------
1 12
2 10
The database is a MySQL.
Not exactly MySQL, but here's something that works in SQL Server. Maybe it'll translate over.
DROP TABLE IF EXISTS #tmp
CREATE TABLE #tmp (
[RowNum] INT IDENTITY(1,1) PRIMARY KEY,
[Template] NVARCHAR(20),
[Answer] NVARCHAR(20),
[Result] INT
)
INSERT INTO #tmp
VALUES ('ABCDEABCDEABCDE','ABCDAABCDBABCDC', NULL),
('EDAEDAEDAEDAEDA','EDBEDBEDBEDBEDB', NULL)
--SELECT * FROM #tmp
DECLARE #current_template NVARCHAR(50) -- Variable to hold the current template
, #current_answer NVARCHAR(50) -- Variable to hold the current answer
, #template_char CHAR(1) -- Char for template letter
, #answer_char CHAR(1) -- Char for answer letter
, #word_index INT -- Index (position) within each word
, #match_counter INT -- Match counter for each word
, #max_iter INT = (SELECT TOP 1 RowNum FROM #tmp ORDER BY RowNum DESC) -- Max iterations
, #row_idx INT = (SELECT TOP 1 RowNum FROM #tmp) -- Minimum RowNum as initial row index value.
WHILE (#row_idx <= #max_iter)
BEGIN
SET #match_counter = 0 -- Reset match counter for each row
SET #word_index = 1 -- Reset word index for each row
SET #current_template = (SELECT [Template] FROM #tmp WHERE RowNum = #row_idx)
SET #current_answer = (SELECT [Answer] FROM #tmp WHERE RowNum = #row_idx)
WHILE (#word_index <= LEN(#current_template))
BEGIN
SET #template_char = SUBSTRING(#current_template, #word_index, 1)
SET #answer_char = SUBSTRING(#current_answer, #word_index, 1)
IF (#answer_char = #template_char)
BEGIN
SET #match_counter += 1
END
SET #word_index += 1
END
UPDATE #tmp
SET Result = #match_counter
WHERE RowNum = #row_idx
SET #row_idx += 1
END
Get values from the temp table:
SELECT * FROM #tmp
Output:
RowNum Template Answer Result
1 ABCDEABCDEABCDE ABCDAABCDBABCDC 12
2 EDAEDAEDAEDAEDA EDBEDBEDBEDBEDB 10
If you are running MySQL 8.0, you can use a recursive query compare the strings character by character:
with recursive chars as (
select rownum, template, answers, 1 idx, 0 res from mytable
union all
select
rownum,
template,
answers,
idx + 1,
res + ( substr(template, idx, 1) = substr(answers, idx, 1) )
from chars
where idx <= least(char_length(template), char_length(answers))
)
select rownum, max(res) result from chars group by rownum order by rownum
In the CTE (the with clause), the anchor (the query before union all) selects the whole table, then the recursive member (the query after union all) compares the characters and the current position (idx) increments the result (res) if they match, and advances to the next position, until the (smallest) string is exhausted. Then, the outer query just aggregates by rownum.
Demo on DB Fiddle:
rownum | result
-----: | -----:
1 | 12
2 | 10
Please bear in mind that this query will not perform well against a large dataset. Other slighly more efficient solutions exist (typically, using a number table instead of a recursive cte), but basically, as commented by Gordon Linoff, you do want to fix your data structure if you need to run such queries. You should store each character in a separate row, along with its rownum and its index in the string. Materialize the proper data structure, and then you won't need to generate it on the fly in each and every query.

Selecting values with more than one occurrence of a character in SQL

Let me explain my question with an example
Consider the following column of values
City
-------
Chennai
Delhi
Mumbai
Output I want is
City
-------
Chennai
Mumbai
When you look at the values 'Chennai' has two 'N's and 'Mumbai' has two 'M's
What is the query to find the values that satisfy the above said condition
I am using MySQL
You may be able to use some of the logic from here and then filter that way Count all occurances of different characters in a column
Can u try this. If you want you can create function and accepts dynamic value and pass to the corresponding function
IF(LEN('Chennai')-LEN(REPLACE('Chennai', 'N', ''))>1 )
Select 'Chennai'
A possible solution if city names contain only latin characters
SELECT DISTINCT city
FROM table1 c CROSS JOIN
(
SELECT 0 n UNION ALL
SELECT a.N + b.N * 5 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) b
ORDER BY n
) n
WHERE CHAR_LENGTH(city) - CHAR_LENGTH(REPLACE(LOWER(city), CHAR(97 + n.n), '')) > 1
Output:
| CITY |
|---------|
| Mumbai |
| Chennai |
Here is SQLFiddle demo
You can use stored procedure for this. Please check my code -
Create table statement -
CREATE TABLE `Cities` (
`City` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Added cities to table and created procedure -
CREATE PROCEDURE `SP_SplitString`()
BEGIN
DECLARE front TEXT DEFAULT NULL;
DECLARE count INT DEFAULT 0;
DECLARE arrayText longtext default "";
DECLARE Value longtext DEFAULT "";
DECLARE val longtext DEFAULT "";
DECLARE done INT DEFAULT FALSE;
DECLARE cityCursor CURSOR FOR SELECT * FROM `Cities`;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cityCursor;
loop_through_rows:
LOOP
FETCH cityCursor INTO Value;
IF done THEN
LEAVE loop_through_rows;
END IF;
SET val = Value;
iterator:
LOOP
IF LENGTH(TRIM(val)) = 0 OR val IS NULL THEN
LEAVE iterator;
END IF;
SET front = LOWER(SUBSTRING(val,1,1));
SET count = LENGTH(Value) - LENGTH(REPLACE(LOWER(Value), front, ''));
IF count > 1 THEN
IF LENGTH(TRIM(arrayText)) = 0 THEN
SET arrayText = Value;
ELSE
SET arrayText = CONCAT(arrayText,",",Value);
END IF;
LEAVE iterator;
END IF;
IF LENGTH(TRIM(val)) > 1 THEN
SET val = SUBSTRING(val,2,LENGTH(TRIM(val)));
ELSE
SET val = "";
END IF;
END LOOP;
END LOOP;
SELECT * FROM `Cities` WHERE FIND_IN_SET(City, arrayText);
END

Fetch the occurrences of particular words in particular column of a table

I have near about 200 words. I want to see how many times those words occurred in a column of a table.
e.g: say we have table test with column statements which has two rows.
How are you. It's been long since I met you.
I am fine how are you.
Now I want to find the the occurrences of words "you" and "how". Output should be something like:
word count
you 3
how 2
since "you" has 3 and how has 2 occurrences in the two rows.
How can I do this?
You can do it like this:
Split the phrase and put all items in a different table;
Remove all ponctuation;
Make a select using the created table and the words that you want to identify.
The way I would approach this is to write a little user defined function to give me the number of times one string appears in another with some allowances for:
upper and lower case
common punctuation
I would then create a table with all of the words that I wish to search with i.e. your 200 list. Then use the function to count the number of occurrences of each word in every phrase, put that in a inline view and then sum the results up by search word.
Hence:
User Defined Function
DELIMITER $$
CREATE FUNCTION `get_word_count`(phrase VARCHAR(500),word VARCHAR(255), delimiter VARCHAR(1)) RETURNS int(11)
READS SQL DATA
BEGIN
DECLARE cur_position INT DEFAULT 1 ;
DECLARE remainder TEXT;
DECLARE cur_string VARCHAR(255);
DECLARE delimiter_length TINYINT UNSIGNED;
DECLARE total INT;
DECLARE result DOUBLE DEFAULT 0;
DECLARE string2 VARCHAR(255);
SET remainder = replace(phrase,'!',' ');
SET remainder = replace(remainder,'.',' ');
SET remainder = replace(remainder,',',' ');
SET remainder = replace(remainder,'?',' ');
SET remainder = replace(remainder,':',' ');
SET remainder = replace(remainder,'(',' ');
SET remainder = lower(remainder);
SET string2 = concat(delimiter,trim(word),delimiter);
SET delimiter_length = CHAR_LENGTH(delimiter);
SET cur_position = 1;
WHILE CHAR_LENGTH(remainder) > 0 AND cur_position > 0 DO
SET cur_position = INSTR(remainder, delimiter);
IF cur_position = 0 THEN
SET cur_string = remainder;
ELSE
SET cur_string = concat(delimiter,LEFT(remainder, cur_position - 1),delimiter);
END IF;
IF TRIM(cur_string) != '' THEN
set result = result + (select instr(string2,cur_string) > 0);
END IF;
SET remainder = SUBSTRING(remainder, cur_position + delimiter_length);
END WHILE;
RETURN result;
END$$
DELIMITER ;
You might have to play with this function a little depending on what allowances you need to make for punctuation and case. Hopefully you get the idea here though!
Populate tables
create table search_word
(id int unsigned primary key auto_increment,
word varchar(250) not null
);
insert into search_word (word) values ('you');
insert into search_word (word) values ('how');
insert into search_word (word) values ('to');
insert into search_word (word) values ('too');
insert into search_word (word) values ('the');
insert into search_word (word) values ('and');
insert into search_word (word) values ('world');
insert into search_word (word) values ('hello');
create table phrase_to_search
(id int unsigned primary key auto_increment,
phrase varchar(500) not null
);
insert into phrase_to_search (phrase) values ("How are you. It's been long since I met you");
insert into phrase_to_search (phrase) values ("I am fine how are you?");
insert into phrase_to_search (phrase) values ("Oh. Not bad. All is ok with the world, I think");
insert into phrase_to_search (phrase) values ("I think so too!");
insert into phrase_to_search (phrase) values ("You know what? I think so too!");
Run Query
select word,sum(word_count) as total_word_count
from
(
select phrase,word,get_word_count(phrase,word," ") as word_count
from search_word
join phrase_to_search
) t
group by word
order by total_word_count desc;
Here is a solution:
SELECT SUM(total_count) as total, value
FROM (
SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
FROM table_name t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
ORDER BY value
) AS x
GROUP BY x.value
) AS y
GROUP BY value
Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1
First we do a query to extract all words as explained here by #peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE
Below is the simple solution for the case when you need to count certain word occurrences, not the complete statistics:
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%how%';
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%you%';

MySQL limit by sum

I want to limit my SELECT results in mySQL by sum.
For Example, this is my table:
(id, val)
Data Entries:
(1,100),
(2,300),
(3,50),
(4,3000)
I want to select first k entries such that the sum of val in those entries is just enough to make it to M.
For example, I want to find entries such that M = 425.
The result should be (1,100),(2,300),(3,50).
How can I do that in a mysql select query?
Try this variant -
SET #sum = 0;
SELECT id, val FROM (
SELECT *, #sum:=#sum + val mysum FROM mytable2 ORDER BY id
) t
WHERE mysum <= 450;
+------+------+
| id | val |
+------+------+
| 1 | 100 |
| 2 | 300 |
| 3 | 50 |
+------+------+
This stored procedure might help:
DELIMITER ;;
CREATE PROCEDURE selectLimitBySum (IN m INT)
BEGIN
DECLARE mTmp INT DEFAULT 0;
DECLARE idTmp INT DEFAULT 0;
DECLARE valTmp INT DEFAULT 0;
DECLARE doneLoop SMALLINT DEFAULT 0;
DECLARE crsSelect CURSOR FOR SELECT id, val FROM test3;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET doneLoop = 1;
OPEN crsSelect;
aloop: LOOP
SET idTmp = 0;
SET valTmp = 0;
FETCH crsSelect INTO idTmp, valTmp;
if doneLoop THEN
LEAVE aloop;
END IF;
SELECT idTmp, valTmp;
SET mTmp = mTmp + valTmp;
if mTmp > m THEN
LEAVE aloop;
END IF;
END LOOP;
CLOSE crsSelect;
END ;;
DELIMITER ;
Please feel free to change the table names or variable names as per your needs.
from mysql reference manual:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
So you cannot use limit the way you proposed. To achieve what you want you need to use your application (java, c, php or whatever else), read the result set row by row, and stop when your condition is reached.
or you can use a prepared statement, but anyway you cant have conditional limit (it must be a constant value) and it is not exactly what you asked for.
create table #limit(
id int,
val int
)
declare #sum int, #id int, #val int, #m int;
set #sum=0;
set #m=250; --Value of an entry
declare limit_cursor cursor for
select id, val from your_table order by id
open limit_cursor
fetch next from limit_cursor into #id, #val
while(##fetch_status=0)
begin
if(#sum<#m)
begin
set #sum = #sum+#val;
INSERT INTO #limit values (#id, #val);
fetch next from limit_cursor into #id, #val
end
else
begin
goto case1;
end
end
case1:
close limit_cursor
deallocate limit_cursor
select * from #limit
truncate table #limit