Finding duplicate consecutive strings in vim using vim regex - html

So let's say I have this in my search file
Foo
Bez, Bez
Foobar
Foo
I want to search for Bez, Bez by using a regex.
This is what I have and I know it's not even remotely correct.
:%s/\([a-zA-Z]\),\([a-zA-Z])/\1,\1,\1/g
So basically what I want to do is make "Bez, Bez" into "Bez, Bez, Bez"
Really, I'm stumped on how to find 2 consecutive equivalent strings.

what about:
%s/\(\w\+\), \1/\1, \1, \1/g
it captures the expression between the parenthesis even before ending the expression whole match, pretty neat huh?.

You use capturing groups such as:
(\w+)\W+\1
but I don't recall the vim equivalent for such regex expression.
I tested using RegexPal and the input you gave
Edit
Found Back References in Vim

Related

Equalent mysql regex for following python regex

python pattern => ^(?=.\bABDUL\b)(?=.\bHAI\b.)(?=.\bMANSOOR\b).*$
need equalent mysql pattern
can you please help me out ?
The regex in question is a quite strange way how to match simple words. It is not clear what is the expected input. Maybe, the input justifies this approach.
^(?=.\bABDUL\b)(?=.\bHAI\b.)(?=.\bMANSOOR\b).*$
Which means: At the beginning there must be any character which is not a part of a word, then ABDUL, a non word character, HAI, a non word character, MANSOOR, a non word character or the end of the string.
^[^[:alnum:]]ABDUL[^[:alnum:]]HAI[^[:alnum:]]MANSOOR([^[:alnum:]]?.*)?$
Which is: At the beginning, not a number or alphabet character (alphanumerical), ABDUL, one non-alphanumerical, HAI, one non-alphanumerical, MANSOOR one non-alphanumerical or the end of the string.
I did not test it and did not intended to make it 100% the same as the first one, but it should be close enough.
For anyone who would like to copy it to their code:
Matching the first character is not very common and can be a bug in the original regexp.
(?=...) is an "lookahead assertion" which does not consume any characters, the POSIX version does not have it, but for a simple string searching it may not be important.
Both versions should match strings like !ABDUL$HAI)MANSOOR - make sure that this is what you want.
For someone who would like to understand the regular expressions I used
https://dev.mysql.com/doc/refman/8.0/en/regexp.html for mysql (POSIX syntax) and https://docs.python.org/3/library/re.html for python (PCRE = Perl compatible syntax)

MySQL matching this regex while it shouldn't

I'm trying to recognize quoting (citing) somebody's else sentence in a markdown text, which I have in my local copy of MySQL GHTorrent dataset. So I wrote this query:
select * from github_discussions where body rlike '(.)*(\s){1,}(>)(\s){1,}(.)+';
it matches some unwanted data, which according to https://regex101.com/, it should not with this particular regular expression.
Test string:
`Params` is plural -> contain<s>s</s>
Matched on MySQL database, not matched at regex101 dot com.
Obvious example of quoting, but not matched at db:
Yes, I believe so.\r\n\r\n\r\n\r\nK\r\n\r\n> On 19-Jul-2014, at 17:33, Stefan Karpinski <notifications#github.com> wrote:\r\n> \r\n> This is the standard 3-clause BSD license, right?\r\n> \r\n> —\r\n> Reply to this email directly or view it on GitHub.
Moreover, MySQL workbench didn't show those return carriage and new line symbols unless copy-pasted here.
Can I normalize (remove \r and \n) with some update query ?
Is MySQL regex implementation different from POSIX standard regex ?
Do you have by any chances maximally clean solution for recognizing quoting in a markdown text ?
Thanks!
You've got an awful lot of parens in there. Try this as functionally what you have above:
select * from github_discussions where body rlike '.*[:blank:]+>[:blank:]+.+'
However, I'm not sure that's really what you want. This would happily match this line:
this is before > and after
which by my understanding is not a quoted string in markdown. Instead I would anchor it at the beginning like this:
select * from github_discussions where body rlike '^[:blank:]*>[:blank:]+'
That will match a greater-than sign at the beginning of the line, optionally preceded by whitespace. Is that what you are looking for?
I'm not sure if your data has newlines embedded. If so, you may need to look into ways of having your regex identify newlines using the ^ anchoring symbol. As is the well accepted conclusion in regex literature, that is left as an exercise for the student. :-)

CTRL+F search for "xx *anything* xx"

I am trying to find and replace all for loops in my (js) code with slightly different syntax. I want to find every for loop that used the syntax "for ( any code here ){". Is there a way to find all such instances?
That's a regular expression question I think. In SublimeText2 start the search functionality. Make sure regular expressions are on (first button, labeled .*) and the search for for\s*\(.*?\)\s*\{.
Enable the regex option and type for \(.+\)\{
Explanation:
the backslashes "escape" the parentheses and brace. In other words, they tell the regex that those are the characters within the search and not part of a regex command. The . searches for any character and the + modifies that to include one or more instances of any character.
Here's a screen shot of sublime text
You want to search by regular expression. Notepad++ supports this, not sure about Sublime Text but I would image it does also. With regular expression enabled, search for
xx.+xx
This will search for the characters xx, followed by any character (.) as many times as it can find it (+), followed by the characters xx. This should give you the result you are looking for.
Here is a article with some information about using regular expressions in Notepad++

Regex for start with three alpha and four digits

I have writen an sql statement to retrieve data from Mysql db and I wanted to select data where myId start with three alpha and 4 digits example : ABC1234K1D2
myId REGEXP '^[A-Z]{3}/d{4}'
but it gives me empty result(data is available in DB). Could someone point me to correct way.
In most regex variants the answer would be: /d matches a / followed by a d; I think you want \d which matches a digit.
However MySQL has a somewhat limited regex implementation (see documentation).
There is no shortcut to character sets like \d for any digit.
You need to either use a named character set ([[:digit:]]), or just use [0-9].
Try this out :
[A-Z]{3}[0-9]{4}
If you want characters to be case insensitive. Try this :
[a-zA-Z]{3}[0-9]{4}
First, in regular regular expressions, to match a digit, you have to use \d instead of /d (which makes you match / followed by d).
Then, I had never noticed, but I think \d (and the others like \w, etc.) don't seem to be available in MySQL. The doc lists the accepted spacial chars, and those generic classes don't appear. You could use [:digit:] instead, even if [0-9] is quite shorter ;)
You are doing fine, just replace /d with \d.Final regex: ^[A-Z]{3}\d{4}
You could use the following pattern :
^[a-zA-Z]{3}\d{4}

Find and Replace with Notepad++

I have a document that was converted from PDF to HTML for use on a company website to be referenced and indexed for search. I'm attempting to format the converted document to meet my needs and in doing so I am attempting to clean up some of the junk that was pulled over from when it was a PDF such as page numbers, headers, and footers. luckily all of these lines that need to be removed are in blocks of 4 lines unfortunately they are not exactly the same therefore cannot be removed with a simple literal replace. The lines contain numbers which are incremental as they correlate with the pages. How can I remove the following example from my html file.
Title<br>
10<br>
<hr>
<A name=11></a>Footer<br>
I've tried many different regular expression attempts but as my skill in that area is limited I can't find the proper syntax. I'm sure i'm missing something fairly easy as it would seem all I need is a wildcard replace for the two numbers in the code and the rest is literal.
any help is apprciated
The search & replace of npp is quite odd. I can't find newline charactes with regular expression, although the documentation says:
As of v4.9 the Simple find/replace (control+h) has changed, allowing the use of \r \n and \t in regex mode and the extended mode.
I updated to the last version, but it just doesn't work. Using the extended mode allows me to find newlines, but I can't specify wildcards.
However, you can use the macros to overcome this problems.
prepare a search that will find a unique passage (like Title<br>\r\n, here you can use the extended mode)
start recording a macro
press F3 to use your search
mark the four lines and delete them
stop recording the macro ... done!
Just replay it and it deletes what you wanted to delete.
If I have understood your request correctly this pattern matches your string:
Title<br>( ?)\n([0-9]+)<br>( ?)\n<hr>( ?)\n<A name=([0-9]+)></a>Footer<br>
I use the Regex Coach to try out complicated regex patterns. Other utilities are available.
edit
As I do not use Notepad++ I cannot be sure that this pattern will work for you. Apologies if that transpires to be the case. (I'm a TextPad man myself, and it does work with that tool).