search comma separated values from column contains comma separated string mysql - mysql

I am trying to search comma separated values from database table column contains comma separated string.
MY DB
id interest status
------------------------
1 1,2,3 1
2 4 1
My search combination contains 1,2, 3,2, 3, 1,4 etc. Any combination will occure.
I want to show all the id that contains any digit from comma separated search combination.
For example, search for 1,4 should return
id
--
1
2
For example, search for 3,2 should return
id
--
1
I have tried using IN and FIND_IN_SET but none of them achieved my result. Is there any other option.
SELECT * FROM `tbl_test` WHERE interest IN (3)
The above code return empty set.

Like Jens has pointed out in the comments, it is highly recommended to normalize your schema.
If you wish to continue with string and comma separated values, you should then be looking at complex regex matching (which I leave it to you to explore).
However, one more alternative is to convert your column interest as JSON datatype. MYSQL 5.7 and above supports these datatypes.
CREATE TABLE IF NOT EXISTS `tbl` (
`id` int(6) unsigned NOT NULL,
`interest` JSON DEFAULT NULL,
`status` int(1) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `tbl` (`id`, `interest`,`status`) VALUES
(1, '[1,2,3,4]',1),
(2, '[1,2]',1),
(3, '[3]',1);
And then query it as follows :
select id from tbl where JSON_CONTAINS( interest ,'[1,2]')
select id from tbl where JSON_CONTAINS( interest ,'[3,4]');
...
You can see it action in this sql fiddle.

Related

How to find all instances of URLs with a particular domain in a freetext field in MySQL?

I have a MySQL DB, and a free text field which has a bunch of free text, and potentially URLs in it.
I would like to be able to find URLs that start with "https://hotspot.com", get the full URL, and any instances within the content. For example if we have 3 urls in the same field, I would like to get all 3 of them.
The table would look like:
CREATE TABLE IF NOT EXISTS `sample` (
`id` int(6) unsigned NOT NULL,
`content` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `sample` (`id`, `content`) VALUES
('1', 'chimichangas and stuff https://hotspot.com/files/filename.pdf'),
('2', 'One hundred angels can dance on the head of a pin'),
('3', 'The last unihorn https://hotspot.com/files/anotherfile.pdf and its so cool https://hotspot.com/files/morefile.png'),
('4', 'The earth is like a ball. https://yahoo.com/files/filename/pdf');
And I would like to get something like
1, https://hotspot.com/files/filename.pdf
3, https://hotspot.com/files/anotherfile.pdf
3, https://hotspot.com/files/morefile.png
(The id of the content, and the URLs it found in it. Although the format does not matter much as long as I get the info)
I was trying to get it with substring_index following some examples I found. But I am not sure I understand them. For example: Extract multiple URL of Text from MySQL column
Your attempt at REGEXP_SUBSTR works. I don't know why it returns 0 rows, but here's my working example, I just take yours and change a little:
SELECT
REGEXP_SUBSTR(content,'(https?:\/\/([A-Za-z0-9]+?\.)?hotspot\.com(\/[A-Za-z0-9\-\._~:\/\?#[#\]!$&()*\+,;\=]*)?)',1,1),
REGEXP_SUBSTR(content,'(https?:\/\/([A-Za-z0-9]+?\.)?hotspot\.com(\/[A-Za-z0-9\-\._~:\/\?#[#\]!$&()*\+,;\=]*)?)',1,2),
REGEXP_SUBSTR(content,'(https?:\/\/([A-Za-z0-9]+?\.)?hotspot\.com(\/[A-Za-z0-9\-\._~:\/\?#[#\]!$&()*\+,;\=]*)?)',1,3),
REGEXP_SUBSTR(content,'(https?:\/\/([A-Za-z0-9]+?\.)?hotspot\.com(\/[A-Za-z0-9\-\._~:\/\?#[#\]!$&()*\+,;\=]*)?)',1,4)
FROM sample
Result:
col1
col2
col3
col4
https://hotspot.com/files/filename.pdf
NULL
NULL NULL
NULL
NULL
NULL
NULL
https://hotspot.com/files/anotherfile.pdf
https://hotspot.com/files/morefile.png
NULL
NULL
NULL
NULL
NULL
NULL
Explanation:
REGEXP_SUBSTR arguments are
string to be matched
regex pattern
starting index
index of occurrence
by selecting 1-4 index of occurrence we can capture up to 4 url in a single row
If you expect there is more, you can just add more columns
What I changed:
before:
(https?:\/\/(.+?\.)?hotspot\.com(\/[A-Za-z0-9\-\._~:\/\?#[]#!$&'()*\+,;\=]*)?)
after:
(https?:\/\/([A-Za-z0-9]+?\.)?hotspot\.com(\/[A-Za-z0-9\-\._~:\/\?#[#\]!$&()*\+,;\=]*)?)
I just changed your (.+?\.)? into ([A-Za-z0-9]+?\.)? to prevent greedy matching all when there's multiple url in one string (I guess this capture group is for matching subdomains of hotspot.com)

SELECT FROM Table WHERE exact number not partial is in a string SQL

I have a table that contains a bunch of numbers seperated by a comma.
I would like to retrieve rows from table where an exact number not a partial number is within the string.
EXAMPLE:
CREATE TABLE IF NOT EXISTS `teams` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`uids` text NOT NULL,
`islive` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=5 ;
INSERT INTO `teams` (`id`, `name`, `uids`, `islive`) VALUES
(1, 'Test Team', '1,2,8', 1),
(3, 'Test Team 2', '14,18,19', 1),
(4, 'Another Team', '1,8,20,23', 1);
I would like to search where 1 is within the string.
At present if I use Contains or LIKE it brings back all rows with 1, but 18, 19 etc is not 1 but does have 1 within it.
I have setup a sqlfiddle here
Do I need to do a regex?
You only need 1 condition:
select *
from teams
where concat(',', uids, ',') like '%,1,%'
I would search for all four possible locations of the ID you are searching for:
As the only element of the list.
As the first element of the list.
As the last element of the list.
As an inner element of the list.
The query would look like:
select *
from teams
where uids = '1' -- only
or uids like '1,%' -- first
or uids like '%,1' -- last
or uids like '%,1,%' -- inner
You could probably catch them all with a OR
SELECT ...
WHERE uids LIKE '1,%'
OR uids LIKE '%,1'
OR uids LIKE '%, 1'
OR uids LIKE '%,1,%'
OR uids = '1'
You didn't specify which version of SQL Server you're using, but if you're using 2016+ you have access to the STRING_SPLIT function which you can use in this case. Here is an example:
CREATE TABLE #T
(
id int,
string varchar(20)
)
INSERT INTO #T
SELECT 1, '1,2,8' UNION
SELECT 2, '14,18,19' UNION
SELECT 3, '1,8,20,23'
SELECT * FROM #T
CROSS APPLY string_split(string, ',')
WHERE value = 1
You SQL Fiddle is using MySQL and your syntax is consistent with MySQL. There is a built-in function to use:
select t.*
from teams t
where find_in_set(1, uids) > 0;
Having said that, FIX YOUR DATA MODEL SO YOU ARE NOT STORING LISTS IN A SINGLE COLUMN. Sorry that came out so loudly, it is just an important principle of database design.
You should have a table called teamUsers with one row per team and per user on that team. There are numerous reasons why your method of storing the data is bad:
Numbers should be stored as numbers, not strings.
Columns should contain a single value.
Foreign key relationships should be properly declared.
SQL (in general) has lousy string handling functions.
The resulting queries cannot be optimized.
Simple things like listing the uids in order or removing duplicate are unnecessarily hard.

MYSQL CSV column check for exclude

I need to find a record who dont have a specific value in CSV column. below is the table structure
CREATE TABLE `employee` (
`id` int NOT NULL AUTO_INCREMENT,
`first_name` varchar(100) NOT NULL,
`last_name` varchar(100) NOT NULL,
`keywords` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Sample record1: 100, Sam, Thompson, "50,51,52,53"
Sample record2: 100, Wan, Thompson, "50,52,53"
Sample record3: 100, Kan, Thompson, "53,52,50"
50 = sports
51 = cricket
52 = soccer
53 = baseball
i need to find the employees name who has the tags of "sports,soccer,baseball" excluding cricket
so the result should return only 2nd and 3rd record in this example as they dont have 51(cricket) but all other 3 though in diff pattern.
My query is below, but i couldnt get it worked any more.
SELECT t.first_name,FROM `User` `t` WHERE (keywords like '50,52,53') LIMIT 10
is there anything like unlike option? i am confused how to get this worked.
You could use FIND_IN_SET:
SELECT t.first_name
FROM `User` `t`
WHERE FIND_IN_SET('50', `keywords`) > 0
AND FIND_IN_SET('52', `keywords`) > 0
AND FIND_IN_SET('53', `keywords`) > 0
AND FIND_IN_SET('51', `keywords`) = 0;
Keep in mind it could be slow. The correct way is to normalize your table structure.
FIND_IN_SET will do the job for you but it does not use indexes. This is not a bug it's a feature.
SUBSTRING_INDEX can use an index and return the data as you wish. You don't have an index on it at the moment, But the catch here is that TEXT fields cannot be fully indexed and what you have is a TEXT field.
Normalize!
This is what you really should be doing. It's not a good idea to store comma separated values in a database. You really should be having a keywords table and since the keywords will be short, you can have a char or varchar narrow column which can be fully indexed.

INSTR(str,substr) does not work when str contains 'é' or 'ë' and substr only 'e'

In another post on stackoverflow, I read that INSTR could be used to order results by relevance.
My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO users (name)
VALUES ('Joël'), ('René');
SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned
Although INSTR does some conversion, it finds ë in é.
SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0
Am I missing something?
http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)
This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.
Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive
This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.
The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:
However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.
To pervert the reports example, if you create the following table:
create table t ( needle varchar(10), haystack varchar(10)
) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");
then run this query, you can see the same behaviour demonstrated:
select needle
, haystack
, needle=haystack as `=`
, haystack LIKE CONCAT('%',needle,'%') as `like`
, instr(needle, haystack) as `instr`
from t;
SQL Fiddle

MySQL Order By doesn't work on Concat(enum)

Currently we have an interessting problem regarding the sort order of MySQL in an enum-field. The fields enum entries have been sorted in the order we want it. Just to be save, we added a CONCAT around it, so it would be cast to char and ordered in alphabetical order, just as suggested by the MySQL-reference (MySQL Reference - Enum)
Make sure that the column is sorted lexically rather than by index number by coding ORDER BY CAST(col AS CHAR) or ORDER BY CONCAT(col).
But that didn't produce the expected results, so we started to investigate further. It seems that the order by statement doesn't work on a combination of enum and the concat function. I've wrote the following sample script, which should show my point:
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO test
VALUES ('b'), ('c'), ('a');
SELECT * FROM test; -- b, c, a
SELECT * FROM test ORDER BY col1 ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS CHAR) ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS BINARY) ASC; -- a, b, c
SELECT * FROM test ORDER BY CONCAT(col1) ASC; -- b, c, a - This goes wrong
I am currently suspecting some kind of problem with the collation/encoding, but I'm not sure. My databases default encoding is also utf8. The MySQL version is 5.6.12 but it seems to be reproduceable with MySQL 5.1. The storage engine is MyIsam but it also occurs with the memory engine.
Any help would be appreciated.
Update:
As it seems the problem is produced only in MySQL 5.6 and by the collation of the column. With the first CREATE TABLE statement, the queries work fine.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_general_ci DEFAULT NULL
)
With the second they don't.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
)
The collation of the table and/or database don't seem to affect the queries. The queries can be tested in this SQL Fiddle
Strange,it works in this fiddle.Do you have a trigger or something?
http://sqlfiddle.com/#!2/0976a/2
BUT,in 5.6 goes haywire:
http://sqlfiddle.com/#!9/0976a/1
Mysql bug,probably.
More,if you input the values in the enum in the "proper" order it works:
http://sqlfiddle.com/#!9/a3784/1
IN the doc:
ENUM values are sorted based on their index numbers, which depend on
the order in which the enumeration members were listed in the column
specification. For example, 'b' sorts before 'a' for ENUM('b', 'a').
As per the document:
Under the Handling of Enumeration Literals section, it states that:
If you store a number into an ENUM column, the number is treated as
the index into the possible values, and the value stored is the
enumeration member with that index. (However, this does not work with
LOAD DATA, which treats all input as strings.) If the numeric value is
quoted, it is still interpreted as an index if there is no matching
string in the list of enumeration values. For these reasons, it is not
advisable to define an ENUM column with enumeration values that look
like numbers, because this can easily become confusing.
For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3:
numbers ENUM('0','1','2')
If you store 2, it is interpreted as an
index value, and becomes '1' (the value with index 2). If you store
'2', it matches an enumeration value, so it is stored as '2'. If you
store '3', it does not match any enumeration value, so it is treated
as an index and becomes '2' (the value with index 3).
mysql> INSERT INTO t (numbers) VALUES(2),('2'),('3');
mysql> SELECT * FROM t;
+---------+
| numbers |
+---------+
| 1 |
| 2 |
| 2 |
+---------+
In your case:
INSERT INTO test
VALUES ('2'), ('3'), ('1');
Index value of '2' is 2, '3' is 3 and '1' is 1.
So the output is 2,3,1