Use of Regex in SQL query - mysql

create table numbers (number varchar(10));
insert into numbers (number) values
('9999999999'),
('5532003644'),
('1212121205'),
('1103220311'),
('1121212128'),
('1234123425');
Trying to SELECT only XY-XY-XY series from the database:
SELECT * FROM numbers
where number regexp '(.*([0-9])\2.*){3}'
Giving me results:
1212121205, 1121212128 & 1234123425
How 1234123425 is XY-XY-XY series?
DB-FIDDLE

All your questions are interesting sql puzzles.
This solution also, does not involve regex:
select distinct n.number
from (
select 1 start union all select 2 union all select 3 union all
select 4 union all select 5
) s cross join numbers n
where
left(substring(n.number, s.start, 2), 1) <> right(substring(n.number, s.start, 2), 1)
and
n.number like concat(
'%', substring(n.number, s.start, 2),
substring(n.number, s.start, 2),
'%', substring(n.number, s.start, 2), '%'
)
See the demo.
Results:
| number |
| ---------- |
| 1212121205 |
| 1121212128 |

Part of the problem is that you're not escaping the backslash. Backslash is both a string escape and a regexp escape; to get it into the regexp engine, you need to escape it for the string parser. Otherwise, \2 is treated as simply 2, so ([0-9])\2 matches any digit follwed by 2.
But you don't need to use a back-reference. \2 will match whatever ([0-9]) matched, which will make your code look for XX, not XY. I don't think there's a way to write a regexp where you match any character other than the back-reference.

This is not quite an answer, but you can do this with regular expressions. The problem is that MySQL does not support back references. Postgres, does, so the following does what you intend:
SELECT *
FROM numbers
WHERE number ~ '([0-9]{2}).*(\1).*(\1)'
Here is a dbfiddle.

Related

counting comma separated values mysql-postgre

I have a column called "feedback", and have 1 field called "emotions". In those emotions field, we can see the random values and random length like
emotions
sad, happy
happy, angry, boring
boring
sad, happy, boring, laugh
etc with different values and different length.
so, the question is, what's query to serve the mysql or postgre data:
emotion
count
happy
3
angry
1
sad
2
boring
3
laugh
1
based on SQL: Count of items in comma-separated column in a table we could try using
SELECT value as [Holiday], COUNT(*) AS [Count]
FROM OhLog
CROSS APPLY STRING_SPLIT([Holidays], ',')
GROUP BY value
but it wont help because that is for sql server, not mysql or postgre. or anyone have idea to translation those sqlserver query to mysql?
thank you so much.. I really appreciate it
Using Postgres:
create table emotions(id integer, emotions varchar);
insert into emotions values (1, 'sad, happy');
insert into emotions values (2, 'happy, angry, boring');
insert into emotions values (3, 'boring');
insert into emotions values (4, 'sad, happy, boring, laugh');
select
emotion, count(*)
from
(select
trim(regexp_split_to_table(emotions, ',')) as emotion
from emotions) as t
group by
emotion;
emotion | count
---------+-------
happy | 3
sad | 2
boring | 3
laugh | 1
angry | 1
From String functions regexp_split_to_table will split the string on ',' and return the individual elements as rows. Since there are spaces between the ',' and the word use trim to get rid of the spaces. This then generates a 'table' that is used as a sub-query. In the outer query group by the emotion field and count them.
Try the following using MySQL 8.0:
WITH recursive numbers AS
(
select 1 as n
union all
select n + 1 from numbers where n < 100
)
,
Counts as (
select trim(substring_index(substring_index(emotions, ',', n),',',-1)) as emotions
from Emotions
join numbers
on char_length(emotions) - char_length(replace(emotions, ',', '')) >= n - 1
)
select emotions,count(emotions) as counts from Counts
group by emotions
order by emotions
See a demo from db-fiddle.
The recursive query is to generate numbers from 1 to 100, supposing that the maximum number of sub-strings is 100, you may change this number accordingly.
I've used MySQL 8.0, the query has no string limits. (Thanks to Ahmed for the intuition on recursive clause)
WITH RECURSIVE cte AS (
SELECT ( LENGTH(REGEXP_REPLACE(emotions, ' ?[A-z]+ ?', ''))+1) AS n, emotions AS subs
FROM feedback
UNION ALL
SELECT n-1 AS n, ( SUBSTRING_INDEX(subs, ', ', n-1) ) AS subs
FROM cte
HAVING n>0
)
SELECT SUBSTRING_INDEX(subs, ', ', -1) AS emotions, COUNT(subs) AS cnt
FROM cte
GROUP BY emotions

MySQL group by comma seperated list unique [duplicate]

This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 2 years ago.
The column textfield has comma-seperated list values
ID | textfield
1 | english,russian,german
2 | german,french
3 | english
4 | null
I'm attempting to count the amount of languages in textfield. The default language is "English", so if null then "English". The correct amount of languages is 4(english,russian,german,french).
Here is my query to attempt doing this:
SELECT SUM((length(`textfield`) - length(replace(`textfield`, ',', '')) + 1)) as my
FROM yourtable;
The result i get is 6, i don't know how to group the languages.
Here is fiddle
http://sqlfiddle.com/#!9/0e532/1
The desired result is 4. How do i solve?
Identifying the source of error
What your query is doing is counting how many languages in each row, and adding them all together. Your query does not take into account duplicates. Since English shows up twice in the table, it is counted twice (and German, too), hence in your example six. Also, another issue is that your current code considers null as what null truly means, the absence of a value.
For example, if your database was
ID | textfield
---|----------
1 | null
you would also be arriving at incorrect results (more on this below).
Solution
This gets you a comma separated result of the languages, no duplicates.
SELECT
GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(textfield, ',', n.digit+1), ',', -1)) textfield
FROM
yourtable
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
ON LENGTH(REPLACE(textfield, ',', '')) <= LENGTH(textfield)-n.digit;
This query can serve as a subquery for what you were attempting to do in the question prompt. In other words, instead of the length('textfield') ... you would provide the resulting column name from this query
Null as in English
This logic should not be implemented at the database level, IMHO. If you want to go ahead and consider null entries as English, that is fine. The downside is the example I provided for you before. When you have a query that solves for the total languages in the database, if English wasn't an explicitly stated language and instead just a null value, then the query wouldn't 'count' English (it's null). But you can't just add 1 every time you find the amount of languages because English might already be explicit.
Recommendations:
Avoid comma separated lists in databases by normalizing your data
No value makes sense for a null field
For version 5.6 (like in the fiddle)
SELECT COUNT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(languages.textfield, ',', numbers.num), ',', -1)) languages_count
FROM (SELECT COALESCE(textfield, 'english') textfield
FROM yourtable) languages
JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) numbers
ON numbers.num <= LENGTH(languages.textfield) - LENGTH(REPLACE(languages.textfield, ',', '')) + 1;
fiddle
For version 8.x (as claimed in a comment)
SELECT COUNT(DISTINCT jsontable.value) languages_count
FROM yourtable
CROSS JOIN JSON_TABLE( CONCAT('["', REPLACE(COALESCE(textfield, 'english'), ',', '","'), '"]'),
"$[*]" COLUMNS( value VARCHAR(254) PATH "$" )
) AS jsontable;
fiddle

Custom SQL query, for sorting values with "-" as separators

I am trying to create an ORDER BY to sort my values properly.
The values contain a string and anywhere from zero to three sets of numbers separated by a -.
Example:
dog-2-13
dog-13-54-3
dog-25
cat-63-12
cat
I want them to be sorted firstly by the string in front and then by each of the "number sections" so that: dog-2-14 > dog-2-13 but dog-1-14 < dog-2-13.
Expected result (with more examples to make it clearer):
cat
cat-63-12
dog-2-13
dog-2-14
dog-3
dog-13-53-3
dog-13-54-3
dog-13-54-4
dog-25
I'm a SQL novice and completely lost. Thank you!
Please try...
SELECT fieldName
FROM
(
SELECT fieldName AS fieldName,
SUBSTRING_INDEX( fieldName,
'-',
1 ) AS stringComponent,
CONVERT( SUBSTRING_INDEX( SUBSTRING( fieldName,
CHAR_LENGTH( SUBSTRING_INDEX( fieldName, '-', 1 ) ) + 2 ),
'-',
1 ),
UNSIGNED ) AS firstNumber,
CONVERT( SUBSTRING_INDEX( SUBSTRING( fieldName,
CHAR_LENGTH( SUBSTRING_INDEX( fieldName, '-', 2 ) ) + 2 ),
'-',
1 ),
UNSIGNED ) AS secondNumber,
CONVERT( SUBSTRING( fieldName,
CHAR_LENGTH( SUBSTRING_INDEX( fieldName, '-', 3 ) ) + 2 ),
UNSIGNED ) AS thirdNumber
FROM table1
ORDER BY stringComponent,
firstNumber,
secondNumber,
thirdNumber
) tempTable;
The inner SELECT grabs the field name (which I am assuming is fieldName) and the three components and places each in a separate field and assigning an alias to that field. Each subfield must be included at this point for sorting purposes. The list is then sorted based upon those values.
Once this sorting is performed the outer SELECT chooses the original field from the list in a now sorted order.
The four outer instances of SUBSTRING_INDEX() are used to grab the desired fields from their first argument. As SUBSTRING_INDEX() grabs all of the string from the beginning to just before the first occurence of the delimiting character this makes finding the first field easy (Note : I am assuming that the first field shall contain no hyphens).
The first argument for the remaining occurences of SUBSTRING_INDEX() is formed by using SUBSTRING() to grab everything from just after the parsed part of fieldName and the following delimiting character. It is told where this is by using CHAR_LENGTH() to count the number of characters before the most recent delimiting character then adding 1 for the most recent delimiting character and another 1 to point SUBSTRING() to the character after the most recent delimiting character.
SUBSTRING_INDEX() will return NULL where it encounters an absent numerical field. Please note that NULL has a different sort value from zero.
The numerical fields are converted into unsigned Integers using CONVERT(). Unsigned integers were chosen as the supplied data does not contain any real numbers. If there are real values then you will need to replace UNSIGNED with DECIMAL. I have also assumed that all of the numbers will be positive.
Further reading...
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_substring-index
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_substring
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_char-length
https://dev.mysql.com/doc/refman/5.7/en/cast-functions.html#function_convert
If you have any questions or comments, then please feel free to post a Comment accordingly.
You can use a Query like this:
SELECT *
FROM yourTable
ORDER BY SUBSTRING_INDEX( SUBSTRING_INDEX(cat,'-',2), '-', -1);
sample
mysql> SELECT SUBSTRING_INDEX('dog-13-54-4','-',2);
+--------------------------------------+
| SUBSTRING_INDEX('dog-13-54-4','-',2) |
+--------------------------------------+
| dog-13 |
+--------------------------------------+
1 row in set (0,00 sec)
mysql>
mysql> SELECT SUBSTRING_INDEX( SUBSTRING_INDEX( 'dog-13-54-4','-',2), '-', -1);
+------------------------------------------------------------------+
| SUBSTRING_INDEX( SUBSTRING_INDEX( 'dog-13-54-4','-',2), '-', -1) |
+------------------------------------------------------------------+
| 13 |
+------------------------------------------------------------------+
1 row in set (0,00 sec)
mysql>

Trying to specific values from a mysql column

I have a mysql (text)column that contains all comments with hash tags and I'm looking for a way to select only the hash tags
Id | Column
1 | I'm #cool and #calm
2 | l like #manchester
3 | #mysql troubles not #cool
You can sort of do what you want by using substring_index() to do the parsing. Assuming that the character after the hash tag is a space, you can do:
select t.*,
substring_index(substring_index(comment, '#', n.n + 1), ' ', 1)
from table t join
(select 1 as n union all select 2 union all select 3) n
on n.n <= length(t.comment) - length(replace(t.comment, '#', '')) ;
The fancy on clause is counting the number of # in the comment, which is counting the number of tags.
You can use Regular Expressions
Try this Regular Expression:
/(#[A-Za-z])\w+
Demo:
[http://regexr.com/3a2q7][1]

Mysql - count values from comma-separated field [duplicate]

This question already has answers here:
How to count items in comma separated list MySQL
(6 answers)
Closed last year.
I have to do some statics from read-only db where value are stored in a weird form
example:
I have 2 rows like
ID text field
1 1001,1003,1004
2 1003, 1005
I need to be able to count that this is "5".
I don't have write access so don't know how to read and count right away without creation a function or something like that.
Clever solution here on SO: How to count items in comma separated list MySQL
LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1
EDIT
Yes you can select it as an additional column: and correcting with the CHAR_LENGTH from #HamletHakobyan's answer:
SELECT
ID,
textfield,
(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) as total
FROM table
SELECT SUM(LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1)
FROM tablename
There is a small but significant omission in all answers. All will work only if database character set is utf8 or so, i.e. where symbol , gets one byte. The fact that the LENGTH function returns number of bytes instead of chars. Right answer is to use CHAR_LENGTH which returns number of characters.
SELECT
SUM(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) cnt
FROM yourTable
You could use something like this:
select sum(total) TotalWords
from
(
select length(`text field`) - length(replace(`text field`, ',', '')) + 1 total
from yourtable
) x
See SQL Fiddle with Demo
SELECT (LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1) as value_count
FROM table_name
Here LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) gives the number of commas in the value of each column. And +1 with this value provides the number of values separated by comma.
All is wrong and doesn't works for me.
The only one that work is this bellow
SELECT (length(`textfield`) - length(replace(`textfield`, ',', '')) + 1) as my
FROM yourtable;
This is my fiddle
http://sqlfiddle.com/#!9/d5a8e1/10
If someone looking for a solution to return 0 for empty fields.
IF(LENGTH(column_name) > 0, LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1, 0)