Alternative to lookbehind with variable width - html

I have some html which contains a number of hyperlinks to html files, but they don't have any file extensions.
For example in the string <a href='variablelengthfilename'> I'm trying to match the trailing ' , so I can replace it with .html' (using a RegEx search in Notepad++) using something like this:
`(?<=href='[A-Za-z]*)'`
but that won't work because Notepad++ doesn't allow variable-length lookbehind assertions.
How else can I achieve this?
Thanks

Since you are working in Notepad++, here is a way to achieve what you are after:
Find what: \bhref='[^']*
Replace with: $&.html
The \bhref='[^']* regex matches a href as a whole word, then =' are matched literally, and [^']* matches 0 or more characters other than '. Note you will need to replace ' with " if the href value is inside double quotes.

Assuming all your links look like that, why not just do a simple replace
'>
with
.html'>
?

Related

Regex that matches any HTML tag with the content inside

I'd like to use Regex to match HTML tag "head" and text inside them so I can delete them easily. I'm using a find and replace tool that is utilizing regex syntax and it really works great in replacing multiple files at once.
I tried doing a lot of syntax but I always fail.
http://regex101.com/r/aZ6pN5/2
Anyone can help please?
Replace .* in your regex with [\S\s]*?, so that it would match line breaks also. You can't use s DOTALL modifier in JavaScript.
<head.*?>([\s\S]*?)<\/head>
[\s\S]*? This would do an non-greedy match of zero or more space or non-space characters.
DEMO
OR
To replace the contents of head tag.
(<head\b[^<>]*>)[\s\S]*?(<\/head>)
Replacement string:
$1stringyouwant$2
DEMO

Sublime Text regex to find and replace whitespace between two xml or html tags?

I'm using Sublime Text and I need to come up with a regex that will find the whitespaces between a certain opening and closing tag and replace them with commas.
Example: Replace white space in
<tags>This is an example</tags>
so it becomes
<tags>This,is,an,example</tags>
Thanks!
You have just to use a simple regex like:
\s+
And replace it with with a comma.
Working demo
This will find instances of
<tags>...</tags>
with whitespace between the tags
(<tags>\S+)\W(.+</tags>)
This will replace the first whitespace with a comma
\1,\2
Open Find and Replace [OS X Cmd+Opt+F :: Windows Ctrl+H]
Use the two values above to find and replace and use the 'Replace All' option. Repeat until all the whitespaces are converted to commas.
The best answer is probably a quick script but this will get you there fairly fast without needing to do any coding.
You can replace any one or more whitespace chunks in between two tags using a single regular expression:
(?s)(?:\G(?!\A)|<tags>(?=.*?</tags>))(?:(?!</?tags>).)*?\K\s+
See the regex demo. Details
(?s) - a DOTALL inline modifier, makes . match line breaks
(?:\G(?!\A)|<tags>(?=.*?</tags>)) - either the end of the previous successful match (\G(?!\A)) or (|) <tags> substring that is immediately followed with any zero or more chars, as few as possible and then </tags> (see (?=.*?</tags>))
(?:(?!</?tags>).)*? - any char that does not start a <tags> or </tags> substrings, zero or more occurrences but as few as possible
\K - match reset operator
\s+ - one or more whitespaces (NOTE: use \s if each whitespace must be replaced).
SublimeText settings:

Regular expression to match html tags

Just wanted to know if this the right way to write a regular-expression for an opening Html-tag <strong> : /<strong[^>]*/i?
What I am trying to do is have a pattern in place for html tags and then use is to match any html document.
Thanks in advance!
Close.
It would be like this for the opening tag:
/<strong[^>]*?>/i
Keep in mind that using Regex on HTML which involves tags nested within themselves can get very messy.
Ok. What I understood is that You want to match any string between "<" and ">" symbols. for an example <codekaro>
To do so you can use :
^[\<][A-Za-z]*[\>]$
Here, ^ indicates start of an expression,
[\<] will check for one occurrence of < symbol, \ is used as escape character for < symbol
[A-Za-z]* will check for any string,
[>] will check for one occurrence of > symbol, \ is used as escape character for > symbol
$ indicates end of an expression.
I encourage you to use this link for regex tutorial and this link to check results of regular expression.
Hope this will help you..!!
Happy learning..!!

What is the appropriate regex string to match a specific html element?

I have a ton of text replacements to make and I would like to try and do this all at once instead of manually. I'm trying to replace <a class='stuff morestuff' href='#'>Some Text</a> with Some Text; essentially stripping off the surrounding anchor tag.
I've been messing around with a search/replace in Visual Studio using regex, but am not really getting anywhere. My latest attempt:
Find what:
\<a class='stuff morestuff' href='#'\>(.+)\<\/a\>
Replace with:
$1
If what I want to do is even feasible, how can I correct my regex to accomplish this?
This regex will match your anchors if the class and href are always the same:
Find: \<a[^\>]class='stuff morestuff' href='\#'[^\>]*\>(.*)\</a\>
Replace: $1
This regex will replace all the anchors with the inner text:
Find: \<a[^\>]*\>(.*)\</a\>
Replace: $1
I'm assuming from your post you plan to use this in Visual Studio's Find/Replace and not in code.
Find:\\<a class='.*?' href='#'>(.*?)\\</a\\>
Replace: $1

Can you search html attributes using wildcards with ruby nokogiri

I know you can search text in html using wildcards. Can you search for attribute values in html using wildcards with nokogiri
e.g., suppose I want to search for classes with value *session*
You can use xpath contains() function to search the document. Something like:
doc.xpath("//*[#*[contains(., 'session')]]").each do |ele|
# something
end
This search returns all the elements with any attribute whose value contains the string 'session'.
Had a similar problem few days ago - notice spaces around class values.
find(:xpath, "//*[contains(concat(' ', normalize-space(#class), ' '), ' icon-edit ')]")