Match Regex in MySQL for repeated word - mysql

I'm having a query problem. I use mysql as DB. I want to use a REGEX to match the result I expected and The Table is
table A
----------------------------------
| ID | Description |
----------------------------------
| 1 | new 2 new 2 new 2 new |
| 2 | new 21 new 2 new |
| 3 | new 12th 2 |
| 4 | 2new 2new |
| 5 | new2 new 2new |
The Result I expected
- numeric 2 can only show twice
- character after/before 2 must be varchar (except after whitespace)
Table B
---------------------------------
| ID | Description |
---------------------------------
| 4 | 2new 2new |
| 5 | new2 new 2new |
The Query I've got so far:
SELECT * FROM a WHERE
(description REGEXP '^[^2]*2[^2]*2[^2]*$')
click here for sqlfiddle demo
could anyone help me to solve this?

Use the below regex to get the Description of fourth and fifth ID's.
SELECT * FROM a WHERE
(description REGEXP '^2[^2]*2[^2]*|\w+2[^2]*2[^2]*$')
http://sqlfiddle.com/#!2/1284e/18
Explanation:
Divide the above regex into two like 2[^2]*2[^2]* as one part and \w+2[^2]*2[^2]* as another part. In regex ^ represents the starting point and $ represents the end point.
2[^2]*2[^2]*
2 Matches the number 2.
[^2]* Matches any character not of 2 zero or more times.
2 Matches the number 2.
[^2]* Matches any character not of 2 zero or more times.
This would get you the 4th ID.
| A logical OR operator usually used to combine two regexes which means match either this(before) or that(after).
\w+2[^2]*2[^2]*
\w+2 Matches one or more word characters which should be followed by the number 2. In your example, 5th ID satisfy this regex.
[^2]* Matches any character not of 2 zero or more times.
2 Matches the number 2.
[^2]* Matches any character not of 2 zero or more times.
This would get you the 5th ID.

Related

Need help on SQL query - Select data with duplicate multiple data entries in one data field

I need to select all data having non-duplicate IDs..
here's my sample table..
----------------------------------------------------------------------------------
ID | Zip-Code | Search Query | ID_LIST
----------------------------------------------------------------------------------
1 | 1000 | Query Sample 1 | 13,14,15,
----------------------------------------------------------------------------------
2 | 2000 | Query Sample 2 | 16,13,17,
----------------------------------------------------------------------------------
3 | 3000 | Query Sample 3 | 18,17,13,
----------------------------------------------------------------------------------
4 | 4000 | Query Sample 4 | 15,16,17,18,
----------------------------------------------------------------------------------
5 | 5000 | Query Sample 5 | 19, 20,
u can notice that IDs 1 and 2 have duplicate, which is 13 on ID_LIST
2 and 3 also have duplicate, which is 13 and 17.
What I want to do is make it like this...
----------------------------------------------------------------------------------
ID | Zip-Code | Search Query | ID_LIST
----------------------------------------------------------------------------------
1 | 1000 | Query Sample 1 | 13,14,15,
----------------------------------------------------------------------------------
2 | 2000 | Query Sample 2 | 16,17,
----------------------------------------------------------------------------------
3 | 3000 | Query Sample 3 | 18,
----------------------------------------------------------------------------------
5 | 5000 | Query Sample 5 | 19,20,
What query would be good for this? Any Help?
Best way to approach it is to normalize your data, as mentioned in comments. But if you absolutely have to do it this way, it would be very difficult to do in query on mysql.
I would suggest you to create a procedure for it. As and when you develop each step, you can google that particular solution of that step, and test it and build up on that. Let me know if any step sound confusing/unclear.
Create a variable string, say v_vals. Initialize with null. At the end of procedure, it will contain all the distinct values of id_list (13,14...20)
Iterate through each row.
Count the number of comma in id_list.
Loop from 1 to number of comma.
In every iteration, use substring and instring to find position of each comma and then extract values from id_list. (13,14...)
use another variable v_id_list. Put null in it.
Search for the values (from step 5) in v_vals. If they exist in v_val, then skip them, else put them in v_val and v_id_list.
Now run an update statement to update id_list with v_id_list.
Now repeat Step 3 to 8 for each row.
Note that v_id_list will be reinitialize for each loop, however v_val will contain all the distinct values of id_list.

Create new records from a field with comma delimited values

I have a poorly created table I want to update. It is set up as
ID
Name
Value
Because a given Name can have more than one value right now the Value field is varchar and populated with comma delimited values:
12,15,92
I would like to create an update or create table query that will make those into separate records so a table with
ID | Name | Value
1 | Bob | 5,6,9
2 | Alice| 5,9
3 | Ted | 1
ends up as
1 | Bob | 5
2 | Bob | 6
3 | Bob | 9
4 | Alice | 5
5 | Alice | 9
6 | Ted | 1
In searching online it appear this is a pretty common issue and I found one of several functions for splitting delimited fields into records here:
http://kedar.nitty-witty.com/blog/mysql-stored-procedure-split-delimited-string-into-rows
This is an old question but there is a mysql function for this, it name is group_concat, you can see the function in the offical documentation, also you can see an example here

SQL operator IN returns only DISTINCT

I have the following query:
SELECT class, subclass ,weight
FROM classes
WHERE classes.term in ('this','paper','present','this','and','this','this')
The above query returns only distinct values. For example I have the following table:
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 3 |
| c | d | paper | 2 |
| e | f | sth | 1 |
+-----------------------------------+
the result I will get is
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 3 |
| c | d | paper | 1 |
+-----------------------------------+
what I actually wanted is the following
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 3 |
| a | b | this | 3 |
| a | b | this | 3 |
| a | b | this | 3 |
| c | d | paper | 2 |
+-----------------------------------+
I there any other way to get all the results without IN "cutting" only distinct values?
The problem is that I cannot change that part: ('this','paper','present','this','and','this','this')
because it is not created by a query. It is a string of words I want to search.
Edit:
- In the original scenario the table contains more than 3000 different words and the actual string is generated by a function I do not have
rights to access and contains 300+ words with many duplicates.
- In the original scenario I want to add the weight of the word every
time it appears in the string
Edit2:
The result I expect is to sum the weights every time a term appears in string.
Expecting results like the following:
+-----------------------------------+
|class | subclass | term | weight |
+-----------------------------------+
| a | b | this | 12 |
| c | d | paper | 2 |
+-----------------------------------+
Is there any other solution?
Use a join:
select c.*
from (select 'this' as term union all
select 'paper' as term union all
select 'present' as term union all
select 'this' as term union all
select 'and' as term union all
select 'this' as term union all
select 'this' as term
) terms left join
classes c
on c.term = terms.term;
This will work in both MySQL and SQLite.
For reference, see this question on how to count the number of occurrences in a substring:
SELECT m.*, (LENGTH('this paper present this and this this') - LENGTH(REPLACE('this paper present this and this this', term, ''))) / LENGTH(term) AS count
FROM myTable;
Once you have the number of occurrences for each string, you can multiply that value by the weight to get the total, like this:
SELECT term, weight * (LENGTH('this paper present this and this this') - LENGTH(REPLACE('this paper present this and this this', term, ''))) / LENGTH(term) AS totalWeight
FROM myTable m;
Note that this solution does not take a separated list of words, but concatenates that list into one string.
Here is an SQL Fiddle example for you.
EDIT
If you want the sum of weights for all terms in the string, without regard to the terms themselves, you can just adjust the query to use the SUM() function, and don't use GROUP BY because you want to sum for the whole table:
SELECT SUM(weight * (LENGTH('this paper present this and this this') - LENGTH(REPLACE('this paper present this and this this', term, ''))) / LENGTH(term)) AS totalWeight
FROM myTable m;
EDIT 2
A little more explanation for the query based on lengths. You can break it up into multiple parts:
LENGTH('this paper present this and this this') returns the number of characters in the string you are searching
LENGTH(REPLACE(myString, term)) is the length of the string above, with your term removed. (So, for example of 'this', it's going to be total length 37, subtracting 16 (4 for each occurrence) which will give you 21.
By subtracting the second value from the first, you'll get the number of characters in the overall string that are as a result of your value (37 - 21 = 16).
Then, it divides it by the length of 'term' to get the number of occurrences. 16 characters, divided by 4 characters in each occurrence means the substring occured 4 times. (16 / 4 = 4). Try these steps again with 'paper' and you will see.
The above procedure is illustrated step by step in this SQL Fiddle.

Match Regex in MySQL for repeated word in one column

I'm having a query problem. I use mysql as DB.
I want to use a REGEX to match the result I expected
and The Table is
table A
----------------------------------
| ID | Description |
----------------------------------
| 1 | new 2 new 2 new 2 new |
| 2 | new 2 new 2 new |
| 3 | new 2 |
| 4 | 2 new 2new |
The Result I expected
---------------------------------
| ID | Description |
---------------------------------
| 2 | new 2 new 2 new |
| 4 | 2 new 2new |
The Query I've tried so far:
SELECT * FROM a WHERE (description REGEXP '([^2][^0..9]])2( [^2][^0..9])([^2][^0..9]])2( [^2][^0..9])')
http://sqlfiddle.com/#!2/7d712/2
Could anyone help me to solve this :(?
Your regex isn't doing what you think it does (although I can't quite guess what you think it does...)
A translation of part of your regex:
([^2][^0..9]])2
means:
( # Start a group
[^2] # Match one character except "2"
[^0..9] # Match one character except "0", "." or "9"
] # Match "]"
) # End of group
2 # Match "2"
As #Tim Pietzcker pointed out, your regular expression does not do what you may think it does. If I understand correctly, I believe you are looking for the following regular expression. This returns ID 2 and 4 respectively.
^[^2]*2[^2]*2[^2]*$
Your SQL query would be:
SELECT * FROM a WHERE (description REGEXP '^[^2]*2[^2]*2[^2]*$')
SQL Fiddle

Mysql where in multiple rows

I have this problem.
One table with.
id | routename | usersid |
1 | route 1 | 1,2,3,5 2 |
2 | route 2 | 5,20,15 3 |
4 | route 4 | 10,15,7,5 |
I need, search ej. userid 5 in colum usersid... but I have no idea how to do, because there are multiple rows.
If you cannot change the schema then you will have to use the REGEXP operator to match on a regular expression. For example
where column REGEXP '(^|,)5(,|$)'
This matches the number 5 either at the beginning or end of the field or surrounded by commas (or any combination thereof), to avoid matching other numbers like 15, 55 or 1234567890.
If the table is large this will perform very slowly as it will require a full table scan
You might be looking for FIND_IN_SET().
select * from Table1
WHERE FIND_IN_SET(5,usersid)
SAMPLE FIDDLE