Match Regex in MySQL for repeated word in one column - mysql

I'm having a query problem. I use mysql as DB.
I want to use a REGEX to match the result I expected
and The Table is
table A
----------------------------------
| ID | Description |
----------------------------------
| 1 | new 2 new 2 new 2 new |
| 2 | new 2 new 2 new |
| 3 | new 2 |
| 4 | 2 new 2new |
The Result I expected
---------------------------------
| ID | Description |
---------------------------------
| 2 | new 2 new 2 new |
| 4 | 2 new 2new |
The Query I've tried so far:
SELECT * FROM a WHERE (description REGEXP '([^2][^0..9]])2( [^2][^0..9])([^2][^0..9]])2( [^2][^0..9])')
http://sqlfiddle.com/#!2/7d712/2
Could anyone help me to solve this :(?

Your regex isn't doing what you think it does (although I can't quite guess what you think it does...)
A translation of part of your regex:
([^2][^0..9]])2
means:
( # Start a group
[^2] # Match one character except "2"
[^0..9] # Match one character except "0", "." or "9"
] # Match "]"
) # End of group
2 # Match "2"

As #Tim Pietzcker pointed out, your regular expression does not do what you may think it does. If I understand correctly, I believe you are looking for the following regular expression. This returns ID 2 and 4 respectively.
^[^2]*2[^2]*2[^2]*$
Your SQL query would be:
SELECT * FROM a WHERE (description REGEXP '^[^2]*2[^2]*2[^2]*$')
SQL Fiddle

Related

getting the new row id from pySpark SQL write to remote mysql db (JDBC)

I am using pyspark-sql to create rows in a remote mysql db, using JDBC.
I have two tables, parent_table(id, value) and child_table(id, value, parent_id), so each row of parent_id may have as many rows in child_id associated to it as needed.
Now I want to create some new data and insert it into the database. I'm using the code guidelines here for the write opperation, but I would like to be able to do something like:
parentDf = sc.parallelize([5, 6, 7]).toDF(('value',))
parentWithIdDf = parentDf.write.mode('append') \
.format("jdbc") \
.option("url", "jdbc:mysql://" + host_name + "/"
+ db_name).option("dbtable", table_name) \
.option("user", user_name).option("password", password_str) \
.save()
# The assignment at the previous line is wrong, as pyspark.sql.DataFrameWriter#save doesn't return anything.
I would like a way for the last line of code above to return a DataFrame with the new row ids for each row so I can do
childDf = parentWithIdDf.flatMap(lambda x: [[8, x[0]], [9, x[0]]])
childDf.write.mode('append')...
meaning that at the end I would have in my remote databasde
parent_table
____________
| id | value |
____________
| 1 | 5 |
| 2 | 6 |
| 3 | 7 |
____________
child_table
________________________
| id | value | parent_id |
________________________
| 1 | 8 | 1 |
| 2 | 9 | 1 |
| 3 | 8 | 2 |
| 4 | 9 | 2 |
| 5 | 8 | 3 |
| 6 | 9 | 3 |
________________________
As I've written in the first code snippet above, pyspark.sql.DataFrameWriter#save doesn't return anything, looking at its documentation, so how can I achieve this?
Am I doing something completely wrong? It looks like there is no way to get data back from a Spark's action (which save is) while I would like to use this action as a transformation, shich leads me to think I may be thinking of all this in the wrong way.
A simple answer is to to use the timestamp + auto increment number to create a unique ID. This only works if there is only one server is running at an instance of time.
:)

Using regular expression to search and replace in MySQL

I am trying to write a query in MySQL to update a field by adding a new text string to the start of its existing text, creating a new line, and then adding the original text.
I have looked into regexp, but am unsure as to whether I am using it properly, as my query doesn't seem to work with the match.
The query I am using is as follows:
UPDATE table_name SET field = REPLACE(field, 'REGEXP ^', ''new_text' REGEXP \n field');
The ^ represents the beginning of the string(start of the text) that I wish to update in the field.
The replacing text consists of the new_text string, the new line (\n) and the original field.
I am relatively new to MySQL, so any insight into the structure of this query and whether it is possible to implement would be greatly appreciated.
You don't need regular expressions for this.
If you have a table of people
+----+-------+
| id | name |
+----+-------+
| 1 | Bob |
| 2 | Frank |
+----+-------+
Doing
UPDATE people SET name = CONCAT('Mr. ', name) WHERE id = 1;
Will give you
+----+---------+
| id | name |
+----+---------+
| 1 | Mr. Bob |
| 2 | Frank |
+----+---------+

failed matching regex in mysql

I'm having a query problem. I use MySQL as DB. I want to use a REGEX to match the result I expected and the Table is:
table A
-----------------------------------------------
| ID | Description |
-----------------------------------------------
| 1 | 29th Marine Regiment/1st Bn (1/29) |
| 2 | new 21 new 2 new |
| 3 | new 2th 2 (2/2) |
| 4 | 2new 2new (2/2) |
| 5 | new2 new 2new |
The result I need to get :
-----------------------------------------------
| ID | Description |
-----------------------------------------------
| 1 | 29th Marine Regiment/1st Bn (1/29) |
I have the correct result for regex
^(29[^0-9]+)[^0-9]+1[^0-9]+\([a-zA-Z0-9/-]*\)\s*$
in http://www.regexr.com/ but I don't get the result in MySQL.
http://sqlfiddle.com/#!2/7bd4b/2
Does anyone know how this happen?
(Turning my comment into an answer.)
You need to escape backslashes! E.g. \( → \\(.
And there are some other issues:
The group in the beginning is not used anywhere, so it can be removed.
[^0-9]+[^0-9]+ (once the group has been removed) is rather redundant. It can be shortened to [^0-9]{2,}, but in this case it looks like [^0-9]+ will suffice.
The fiddle version had some strange {}s that had to be removed.
The result is thus:
'^29[^0-9]+1[^0-9]+\\([a-zA-Z0-9/-]*\\)\\s*$'

Match Regex in MySQL for repeated word

I'm having a query problem. I use mysql as DB. I want to use a REGEX to match the result I expected and The Table is
table A
----------------------------------
| ID | Description |
----------------------------------
| 1 | new 2 new 2 new 2 new |
| 2 | new 21 new 2 new |
| 3 | new 12th 2 |
| 4 | 2new 2new |
| 5 | new2 new 2new |
The Result I expected
- numeric 2 can only show twice
- character after/before 2 must be varchar (except after whitespace)
Table B
---------------------------------
| ID | Description |
---------------------------------
| 4 | 2new 2new |
| 5 | new2 new 2new |
The Query I've got so far:
SELECT * FROM a WHERE
(description REGEXP '^[^2]*2[^2]*2[^2]*$')
click here for sqlfiddle demo
could anyone help me to solve this?
Use the below regex to get the Description of fourth and fifth ID's.
SELECT * FROM a WHERE
(description REGEXP '^2[^2]*2[^2]*|\w+2[^2]*2[^2]*$')
http://sqlfiddle.com/#!2/1284e/18
Explanation:
Divide the above regex into two like 2[^2]*2[^2]* as one part and \w+2[^2]*2[^2]* as another part. In regex ^ represents the starting point and $ represents the end point.
2[^2]*2[^2]*
2 Matches the number 2.
[^2]* Matches any character not of 2 zero or more times.
2 Matches the number 2.
[^2]* Matches any character not of 2 zero or more times.
This would get you the 4th ID.
| A logical OR operator usually used to combine two regexes which means match either this(before) or that(after).
\w+2[^2]*2[^2]*
\w+2 Matches one or more word characters which should be followed by the number 2. In your example, 5th ID satisfy this regex.
[^2]* Matches any character not of 2 zero or more times.
2 Matches the number 2.
[^2]* Matches any character not of 2 zero or more times.
This would get you the 5th ID.

MySQL - IF something, THEN also select where

So, I have a confusing MySQL issue. I feel like I need to use some IF statements, but I'm really not sure how to implement them into this situation! First, consider the following query. It's simple:
SELECT *
FROM flow
INNER JOIN flow_strings
USING(node_id)
WHERE
(
flow.parent = 0
OR flow.parent = :user_flow
)
AND flow.source = 0
AND :input LIKE flow_strings.sql_regex
However, I need to expand it, and that's where I'm stuck. Thinking through this, I'm not really sure how to explain it, so following are the table structures, and then some examples.
TABLE flow
+---------+--------+--------+
| node_id | parent | source |
+---------+--------+--------+
| 1 | 0 | 0 |
+---------+--------+--------+
| 2 | 0 | 0 |
+---------+--------+--------+
| 3 | 1 | 1 |
+---------+--------+--------+
| 4 | 3 | 0 |
+---------+--------+--------+
TABLE flow_strings
+----------------+---------+-----------+
| flow_string_id | node_id | sql_regex |
+----------------+---------+-----------+
| 1 | 1 | fish |
+----------------+---------+-----------+
| 2 | 1 | wish |
+----------------+---------+-----------+
| 3 | 1 | *yes* |
+----------------+---------+-----------+
| 4 | 2 | *no* |
+----------------+---------+-----------+
| 5 | 2 | nay |
+----------------+---------+-----------+
| 6 | 3 | *herp* |
+----------------+---------+-----------+
[ ... ]
TABLE placeholder_variables
+-------------+--------+------+-------+
| variable_id | source | name | value |
+-------------+--------+------+-------+
| 1 | 0 | yes | sure |
+-------------+--------+------+-------+
| 2 | 0 | yes | yeah |
+-------------+--------+------+-------+
| 3 | 0 | no | nope |
+-------------+--------+------+-------+
| 4 | 1 | herp | derp |
+-------------+--------+------+-------+
NOW, here's what I need to happen based on :input.
"fish", "wish", "sure", or "yeah" ---SELECT flow.node_id 1
This is because "fish", "wish", and "*yes*" are all associated with flow.node_id 1. Note that *yes* is surrounded by asterisks, so instead of "yes" being interpreted literally, it instead draws the values from placeholder_variables.
"nope" or "nay" ---SELECT flow.node_id 2
This is because "*no*" and "nay" are associated with flow.node_id 2. Again, because of the asterisks, "no" is not interpreted literally, but "nope" matches because "no" is in the placeholder_variables table, even though "nope" is not in the flow_strings table.
"no" and "*no*" ---NO MATCH
Even though *no* is in flow_strings, it should not match because it has asterisks around it (and a corresponding placeholder_variable) which means it should not be interpreted literally, and so can only be evaluated by its corresponding placeholder variable's value(s).
"baby" ---NO MATCH
Even though "baby" does not have asterisks around it, it corresponds to flow.node_id 3, and that node's flow.source is 1.
"derp" ---NO MATCH
This is because placeholder_variables.source is 1 for *herp*, even though it is in the flow_strings table.
"*herp*" ---SELECT flow.node_id 4
Even though there are asterisks around *herp* in the flow_strings table, the corresponding placeholder_variable.source is 1.
** TO SUM UP **
No source = 1
Interpret placeholder_variables if the sql_regex is surrounded by asterisks, but only if the corresponding placeholder_variable's source is 0.
If source is 0, and no asterisks are present, interpret sql_regex literally.
I know that I can use MySQL's SUBSTRING() to work with the asterisks. I also know that, (as I am using PHP) I could theoretically split this into two queries, and then dynamically generate the second query by looping through the first. However, this would be both A) memory intensive, and B) sloppy.
So my question is this: Is it possible to do this using MySQL alone? If yes, how would you recommend I format it? You don't need to write the query for me, but if you could help me with some of the logic, I'd be very grateful! I have absolutely no idea what to try besides what I have already outlined, and I definitely don't want to do that.
Thanks!
You can use this solution:
SELECT a.*,
COALESCE(c.value, b.sql_regex) AS string #-- If there was a successful JOIN (sql_regex was a variable), then display the value of the "value" column in placeholder_variables, otherwise if the JOIN did not succeed (sql_regex was a literal value), just display sql_regex instead.
FROM flow a
JOIN flow_strings b ON a.node_id = b.node_id
LEFT JOIN placeholder_variables c #-- LEFT JOIN placeholder_variables on these conditions:
ON b.sql_regex LIKE '*%*' AND -- That sql_regex is a variable
REPLACE(b.sql_regex, '*', '') = c.name AND -- Match sql_regex up with the "name" column in placeholder_variables. We must replace the asterisks in sql_regex so that the values can match ("name" column do not contain asterisks)
c.source = 0
WHERE a.source = 0 AND
COALESCE(c.value, b.sql_regex) = :input -- If the string was interpreted as a placeholder variable, make the condition on the interpreted value in the placeholder_variables table ("name" column), otherwise, just make the condition on the sql_regex column.
SQLFiddle Demo