I have a mysql (text)column that contains all comments with hash tags and I'm looking for a way to select only the hash tags
Id | Column
1 | I'm #cool and #calm
2 | l like #manchester
3 | #mysql troubles not #cool
You can sort of do what you want by using substring_index() to do the parsing. Assuming that the character after the hash tag is a space, you can do:
select t.*,
substring_index(substring_index(comment, '#', n.n + 1), ' ', 1)
from table t join
(select 1 as n union all select 2 union all select 3) n
on n.n <= length(t.comment) - length(replace(t.comment, '#', '')) ;
The fancy on clause is counting the number of # in the comment, which is counting the number of tags.
You can use Regular Expressions
Try this Regular Expression:
/(#[A-Za-z])\w+
Demo:
[http://regexr.com/3a2q7][1]
Related
This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 2 years ago.
The column textfield has comma-seperated list values
ID | textfield
1 | english,russian,german
2 | german,french
3 | english
4 | null
I'm attempting to count the amount of languages in textfield. The default language is "English", so if null then "English". The correct amount of languages is 4(english,russian,german,french).
Here is my query to attempt doing this:
SELECT SUM((length(`textfield`) - length(replace(`textfield`, ',', '')) + 1)) as my
FROM yourtable;
The result i get is 6, i don't know how to group the languages.
Here is fiddle
http://sqlfiddle.com/#!9/0e532/1
The desired result is 4. How do i solve?
Identifying the source of error
What your query is doing is counting how many languages in each row, and adding them all together. Your query does not take into account duplicates. Since English shows up twice in the table, it is counted twice (and German, too), hence in your example six. Also, another issue is that your current code considers null as what null truly means, the absence of a value.
For example, if your database was
ID | textfield
---|----------
1 | null
you would also be arriving at incorrect results (more on this below).
Solution
This gets you a comma separated result of the languages, no duplicates.
SELECT
GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(textfield, ',', n.digit+1), ',', -1)) textfield
FROM
yourtable
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) n
ON LENGTH(REPLACE(textfield, ',', '')) <= LENGTH(textfield)-n.digit;
This query can serve as a subquery for what you were attempting to do in the question prompt. In other words, instead of the length('textfield') ... you would provide the resulting column name from this query
Null as in English
This logic should not be implemented at the database level, IMHO. If you want to go ahead and consider null entries as English, that is fine. The downside is the example I provided for you before. When you have a query that solves for the total languages in the database, if English wasn't an explicitly stated language and instead just a null value, then the query wouldn't 'count' English (it's null). But you can't just add 1 every time you find the amount of languages because English might already be explicit.
Recommendations:
Avoid comma separated lists in databases by normalizing your data
No value makes sense for a null field
For version 5.6 (like in the fiddle)
SELECT COUNT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(languages.textfield, ',', numbers.num), ',', -1)) languages_count
FROM (SELECT COALESCE(textfield, 'english') textfield
FROM yourtable) languages
JOIN (SELECT 1 num UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5) numbers
ON numbers.num <= LENGTH(languages.textfield) - LENGTH(REPLACE(languages.textfield, ',', '')) + 1;
fiddle
For version 8.x (as claimed in a comment)
SELECT COUNT(DISTINCT jsontable.value) languages_count
FROM yourtable
CROSS JOIN JSON_TABLE( CONCAT('["', REPLACE(COALESCE(textfield, 'english'), ',', '","'), '"]'),
"$[*]" COLUMNS( value VARCHAR(254) PATH "$" )
) AS jsontable;
fiddle
create table numbers (number varchar(10));
insert into numbers (number) values
('9999999999'),
('5532003644'),
('1212121205'),
('1103220311'),
('1121212128'),
('1234123425');
Trying to SELECT only XY-XY-XY series from the database:
SELECT * FROM numbers
where number regexp '(.*([0-9])\2.*){3}'
Giving me results:
1212121205, 1121212128 & 1234123425
How 1234123425 is XY-XY-XY series?
DB-FIDDLE
All your questions are interesting sql puzzles.
This solution also, does not involve regex:
select distinct n.number
from (
select 1 start union all select 2 union all select 3 union all
select 4 union all select 5
) s cross join numbers n
where
left(substring(n.number, s.start, 2), 1) <> right(substring(n.number, s.start, 2), 1)
and
n.number like concat(
'%', substring(n.number, s.start, 2),
substring(n.number, s.start, 2),
'%', substring(n.number, s.start, 2), '%'
)
See the demo.
Results:
| number |
| ---------- |
| 1212121205 |
| 1121212128 |
Part of the problem is that you're not escaping the backslash. Backslash is both a string escape and a regexp escape; to get it into the regexp engine, you need to escape it for the string parser. Otherwise, \2 is treated as simply 2, so ([0-9])\2 matches any digit follwed by 2.
But you don't need to use a back-reference. \2 will match whatever ([0-9]) matched, which will make your code look for XX, not XY. I don't think there's a way to write a regexp where you match any character other than the back-reference.
This is not quite an answer, but you can do this with regular expressions. The problem is that MySQL does not support back references. Postgres, does, so the following does what you intend:
SELECT *
FROM numbers
WHERE number ~ '([0-9]{2}).*(\1).*(\1)'
Here is a dbfiddle.
I need output in following order(firstly, group by last 3 letters and then arrange in order based on the first 3 digits)
ColumnA
001_eng
004_eng
002_chn
003_usa
But order by ColumnA gives me
ColumnA
001_eng
002_chn
003_usa
004_eng
This is just sample data. I have hundreds of entries in this format and the values keep changing everyday. So, specifying all the entries inside the field is not a feasible option.
I'm not sure of how to use FIELD() in my case.
You can use FIELD:
select *
from tablename
order by
FIELD(ColumnA, '001_eng', '004_eng', '002_chn', '003_usa')
(please be careful if ColumnA is not in the list the field function will return 0 and the rows will be put on top)
or you can use CASE WHEN:
select *
from tablename
order by
case
when ColumnA='001_eng' then 1
when ColumnA='004_eng' then 2
when ColumnA='002_chn' then 3
when ColumnA='003_usa' then 4
else 5
end
or you can use a different languages table where you specify the order:
id | name | sortorder
1 | 001_eng | 1
2 | 002_chn | 3
3 | 003_usa | 4
4 | 004_eng | 2
then you can use a join
select t.*
from
tablename t inner join languages l
on t.lang_id = l.id
order by
l.sortorder
(with proper indexes this would be the better solution with optimal performances)
You can use SUBSTRING_INDEX in case all ColumnA values are formatted like in the sample data:
SELECT *
FROM mytable
ORDER BY FIELD(SUBSTRING_INDEX(ColumnA, '_', -1), 'eng', 'chn', 'usa'),
SUBSTRING_INDEX(ColumnA, '_', 1)
Demo here
you can use substring() and get order by
SELECT *
FROM table_name
ORDER BY SUBSTRING(ColumnA, -7, 3);
This question already has answers here:
How to count items in comma separated list MySQL
(6 answers)
Closed last year.
I have to do some statics from read-only db where value are stored in a weird form
example:
I have 2 rows like
ID text field
1 1001,1003,1004
2 1003, 1005
I need to be able to count that this is "5".
I don't have write access so don't know how to read and count right away without creation a function or something like that.
Clever solution here on SO: How to count items in comma separated list MySQL
LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1
EDIT
Yes you can select it as an additional column: and correcting with the CHAR_LENGTH from #HamletHakobyan's answer:
SELECT
ID,
textfield,
(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) as total
FROM table
SELECT SUM(LENGTH(textfield) - LENGTH(REPLACE(textfield, ',', '')) + 1)
FROM tablename
There is a small but significant omission in all answers. All will work only if database character set is utf8 or so, i.e. where symbol , gets one byte. The fact that the LENGTH function returns number of bytes instead of chars. Right answer is to use CHAR_LENGTH which returns number of characters.
SELECT
SUM(CHAR_LENGTH(textfield) - CHAR_LENGTH(REPLACE(textfield, ',', '')) + 1) cnt
FROM yourTable
You could use something like this:
select sum(total) TotalWords
from
(
select length(`text field`) - length(replace(`text field`, ',', '')) + 1 total
from yourtable
) x
See SQL Fiddle with Demo
SELECT (LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1) as value_count
FROM table_name
Here LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) gives the number of commas in the value of each column. And +1 with this value provides the number of values separated by comma.
All is wrong and doesn't works for me.
The only one that work is this bellow
SELECT (length(`textfield`) - length(replace(`textfield`, ',', '')) + 1) as my
FROM yourtable;
This is my fiddle
http://sqlfiddle.com/#!9/d5a8e1/10
If someone looking for a solution to return 0 for empty fields.
IF(LENGTH(column_name) > 0, LENGTH(column_name) - LENGTH(REPLACE(column_name, ',', '')) + 1, 0)
I am working for a client that stores item tags in the MySQL DB like so (I know, I know - not ideal):
coats_and_jackets-Woven_Jacket-brand:Hobbs;
coats_and_jackets-Woven_Jacket-color:Black;
coats_and_jackets-Woven_Jacket-style:Boucle;
coats_and_jackets-Woven_Jacket-pattern:Plain;
dresses-Pinafore-brand:COS;
dresses-Pinafore-color:Blue _ Navy;
dresses-Pinafore-style:Wool;
dresses-Pinafore-pattern:Plain;
shoes-Ankle_Boot-brand:Topshop;
shoes-Ankle_Boot-color:Black;
shoes-Ankle_Boot-style:Leather;
shoes-Ankle_Boot-pattern:Plain;
bags-Tote-brand:Mulberry;
bags-Tote-color:Brown _ Tan;
bags-Tote-style:Leather;
bags-Tote-pattern:Plain;
shoes-Ballet_shoes-brand:Chanel;
shoes-Ballet_shoes-color:Black;
shoes-Ballet_shoes-style:Leather;
shoes-Ballet_shoes-pattern:Plain;
accessories-Scarf-brand:Zara;
accessories-Scarf-color:Brown _ Tan;
accessories-Scarf-style:Wool;
accessories-Scarf-pattern:Checked;
Each tag is broken down into 4 parts like so: category-type-brand, category-type-color, category-type-style, category-type-pattern
Not all 4 parts of a tag are required and can be omitted from the DB.
I have been tasked with finding out how many tags an item has, so in this example 6 tags have been used, each with all 4 parts.
The query I have so far counts all the tag parts, in this example 24, but I cannot assume that each tag will have all 4 parts stored. So cannot divide the parts amount by 4 to get the amount of tags.
In this example, the 6 tags used are as follows:
Coats & Jackets (Woven Jacket)
Dresses (Pinafore)
Shoes (Ankle boot)
Bags (Tote)
Shoes (Ballet Shoes)
Accessories (Scarf)
Now I'm not concerned about the category, type or parts (brand, color, style, pattern) - I'm just concerned about fetching the total amount of tags for this item.
Also, the data example above would be stored in a db row that looks like:
+----------+-------------+----------------------------+
| ID | meta_key | meta_value |
+----------+-------------+----------------------------+
| 1 | tags | coats_and_jackets-wove... |
+----------+-------------+----------------------------+
| 2 | item_desc | Fashion editor |
+----------+-------------+----------------------------+
Help structuring this query would be much appreciated.
The tags use hyphen as a separator. Here is a method for finding the number of tags used by a given item:
select it.*, length(it.tags) - length(replace(it.tags, '-', ''))+1
from itemtags it
This replaces the hyphen with an empty string, and measures the difference in lengths.
Assuming I'm understanding your requirement correctly, how about something like this (with CTE used to demonstrate assumed table structure)
WITH CTE1(tag) AS(
select 'coats_and_jackets-Woven_Jacket-brand:Hobbs' union
-- ...
select 'accessories-Scarf-color:Brown _ Tan' union
select 'accessories-Scarf-style:Wool' union
select 'accessories-Scarf-pattern:Checked'
)
, CTE2(tag_prefix) AS(
select LEFT(tag, CHARINDEX('-', tag, CHARINDEX('-', tag) + 1) - 1) from CTE1
)
select tag_prefix, COUNT(*) from CTE2 group by tag_prefix
This will give you results of...
accessories-Scarf 4
bags-Tote 4
coats_and_jackets-Woven_Jacket 4
dresses-Pinafore 4
shoes-Ankle_Boot 4
shoes-Ballet_shoes 4
... which gives you the tag prefix and number of parts used. From there you can count the individual rows or sum the number of parts or whatever else you need...
I've just realised that my solution is completely pointless given that I missed the 'mysql' tag ;) but I'll post it up here anyway. Hopefully it can give you a pointer on how to proceed.
WITH CTE1(ID, meta_key, meta_value) AS(
select 1, 'tags', 'coats_and_jackets-Wo...' union all
select 2, 'item_desc', 'Fashion editor'
)
, TagsCTE AS(
select t.ID, x.Item as tag_and_value
from CTE1 t
cross apply dbo.fn_SplitString(t.meta_value, ';') x
where meta_key = 'tags' and LEN(x.Item) > 0
)
select ID, COUNT(parts_count) from (
select ID, COUNT(*) as parts_count
from TagsCTE
group by ID, LEFT(tag_and_value, CHARINDEX('-', tag_and_value, CHARINDEX('-', tag_and_value) + 1) - 1)
) a group by ID
This gives results of:
1 6
Good luck.