Regular Expression for HTML attributes - html

I need to write a regular expression to catch the following things in bold
class="something_A211"
style="width:380px;margin-top: 20px;"
I have no idea how to write it, can someone help me?
I need this because, in html file i have to replace (whit notepad++) with empty, so i want to have a clear < tr > or < td > or anything else.
Thank you

You can use a regex like this to capture the content:
((?:class|style)=".*?")
Working demo
However, if you just want to match and delete that you can get rid of capturing groups:
(?:class|style)=".*?"

For all constructions like something="data", you can use this.
[^\s]*?\=\".*?\"
https://regex101.com/r/oQ5dR0/1
The link shows you what everything does.
To explain it briefly, a non space character can come before the "=" any mumber of times, then comes the quotes and info inside of them.
The question mark in .*? (and character any number of times) is needed so only the minimum amount of characters will be used (instead of looking for the next possible quotes somewhere further along)

Related

HTML Regex Selector

I am trying to make a regex for HTML, I am coming up with a few minor issues regarding header html blocks to be selected and title in head for some reason,
To explain it better:
<h5>Thing</h5> will all be selected but I only want <h5> and </h5> selected and it's the same with <title>Test</title> I only want the html tags selected but it selects the whole thing,
here is my regex so far:
/(<\/(\w+)>)|(<(\w+)).+?(?=>)>|(<(\w+))>/ig
Your problem is here: <(\w+).+?(?=>)>
This says:
open an angle bracket
consume as many word characters as possible (min 1)
consume as few characters as possible (min 1)
make sure a closing angle bracket follows
consume the closing angle bracket
First of all, step 4 is superfluous; you know you will have a closing bracket next, otherwise step 5 will fail to match.
But the bigger problem is step 3. Let's see what happens on <h5>Thing</h5>:
<
h5 (because > is not a word character any more)
>Thing</h5, because this is the least amount matched before a closing angle bracket (remember, matching 0 characters here is not an option)
Make sure next is >
>
Anyway, in the simple case, what you want can be done by /<\/?.+?>/. This will break if attributes have values that include a greater than symbol: <div title="a>b">. Avoiding this is possible, but it makes the regexp a bit more complex, kind of like this (but I may have forgotten something):
<\w+(?:\s+\w+(?:=(?:"[^"]*"|'[^']*'|[^'"][^\s>]*)?)?)*\s*>|<\/\w+>

RegEx to substitute tag names, leaving the content and attributes intact

I would like to replace opening and closing tag, leaving the content of tags and its attribute intact.
Here is what I have:
<div class="QText">Text to be kept</div>
to be replaced with
<span class="QText">Text to be kept</span>
I tried this expression which finds all expressions I want but there seems to be no way to replace found expressions.
<div class="QText">(.*?)</div>
Thanks in advance.
I think #AmitJoki's answer will work well enough in certain circumstances, but if you only want to replace div elements when they have an attribute or a specific set of attributes, then you would want to use a regex replacement with backreferences - how you specify and refer to a backreference, unfortunately, depends upon your chosen editor. Visual Studio has the most unique and annoying "flavor" of regex I know of, while Dreamweaver has a fairly typical implementation (both as well as I imagine whatever editor you're using do regex replacement - you just have to know the menu item or keystroke to bring up the dialog).
If memory serves, Dreamweaver has replacement options when you hit Ctrl+F, while you have to hit Ctrl+H, so try those.
Once you get a "Find" and "Replace" box, you would put something like what you have in your last example above: <div class="QText">(.*?)</div> or perhaps <div class="(QText|RText|SText)">(.*?)</div> into your "Find" box, then put something like <span class="QText">\1</span> or <span class="\1">\2</span> in the "Replacement" box. A few utilities might use $1 to refer to a backreference rather than \1, but you'll have to lookup help or experiment to be sure.
If you are using a language to run this expression, you need to tell us which language.
If you are using a specific editor to run this expression, you need to tell us which editor.
...and never forget the prevailing wisdom on regex and HTML
Just replace div.
var s="<div class='QText'>Text to be kept</div>";
alert(s.replace(/div/g,"span"));
Demo: http://jsfiddle.net/9sgvP/
Mark it as answer if it helps ;)
Posted as requested
If its going to be literal like that, capture what's to be kept, then replace the rest,
Find: <div( class="QText">.*?</)div>
Replace: <span$1span>

Remove first line from HTML Markup Field using RegEx

I have a single text field that contains HTML markup. The system that generates this field content always seems to generate a first line with a non-visible carriage return value in it and I can't seem to prevent if from doing so.
Does anyone know of a way (perhaps using a Regular Expression), to remove that first line from this text field?
I'd prefer to leave all other instances of the carriage return values in the field as is, so if it's a RegEx statement that will just remove the first line of a text field, that would work for me.
Any suggestions most welcomed.
Cheers,
Wayne
Usually the trim (often removes whitespaces, CR ) method is used for this in many programming languages. You did not state in what language you will be doing this...

Regex all uppercase with special characters

I have a regex '^[A0-Z9]+$' that works until it reaches strings with 'special' characters like a period or dash.
List:
UPPER
lower
UPPER lower
lower UPPER
TEST
test
UPPER2.2-1
UPPER2
Gives:
UPPER
TEST
UPPER2
How do I get the regex to ignore non-alphanumeric characters also so it includes UPPER2.2-1 also?
I have a link here to show it 'real-time': http://www.rubular.com/r/ev23M7G1O3
This is for MySQL REGEX
EDIT: I didn't specify I wanted all non-alphanumeric characters (including spaces), but with the help of others here it led me to this: '^[A-Z-0-9[:punct:][:space:]]+$' is there anything wrong with this?
Try
'^[A-Z0-9.-]+$'
You just need to add the special characters to the group, optionally escaping them.
Additionally if you choose not to escape the -, be aware that it should be placed at the start or the end of the grouping expression to avoid the chance that it may be interpreted as delimiting a range.
To your updated question, if you want all non-whitespace, try using a group such as:
^[^ ]+$
which will match everything except for a space.
If instead what you wanted is all non-whitespace and non-lowercase, you likely will want to use:
^[^ a-z]+$
The 'trick' used here is adding a caret symbol after the opening [ in the group expression. This indicates that we want the negation of the match.
Following the pattern, we can also apply this 'trick' to get everything but lowercase letters like this:
^[^a-z]+$
I'm not really sure which of the 3 above you want, but if nothing else, this ought to serve as a good example of what you can do with character classes.
I believe you are looking for (one?) uppercase-word match, where word is pretty much anything.
^[^a-z\s]+$
...or if you want to allow more words with spaces, then probably just
^[^a-z]+$
You just need to put in the . and -. In theory, you don't need to escape because they are inside the brackets, but I like to to remind myself to escape when I have to.
'^[A-Z0-9\.\-]+$'
Try regular expression as below:
'^[A0-Z0\\.\\-]+$'

How do I check if values between html tags are blank or empty using regular expressions in notepad plus plus

I'm conducting a mass search of files in notepad++ and I need to determine if there are no values between a set of tags (i.e. ).
".*?" will search for 0 or more characters (well, most), which is fine. But I'm looking for a set of tags with at least one character between them.
".+?" is similar to the above and does work in notepad++.
I tried the following, which was unsuccessful:
<author>.{0}?</author>
Thank you for any help.
Since you look for something that doesn't exist you don't have to make it that complicated. Simply searching for <author></author> would do the trick, wouldn't it? If you want to include space-characters as "nothing" you could modify it to the following:
<author>\s*?</author>
Output:
<author></author> Match
<author> </author> Match
<author>something</author> No match
I don't understand why you are using the "?" operator; ".+" should yield the result you need.