Replace arbitrary number of plus signs with spaces - html

I have more than 2000 <img> tags and I want to replace alt text for each one of them. Alt text are like:
alt="pinblock"
alt="Rich+Austin+shop+4"
alt="hot+dry+sun+az"
I want a quick way to replace all '+' with space (' '), hence I'm using regex to fix this.
I've tried this so far:
Find what: alt="(\D+)[+](\D+)[+*](\D*)[+*](\D*)[+*](\D*)[+*](\D*)[+*](\D*)[+*](\D*)"\s
Replace With: alt="\1 \2 \3 \4 \5 \6 \7 \8"
I know I'm doing something wrong, please help.
Complete string would be:
<img border="0" height="111" src="https://1.bp.blogspot.com/-WL5_jMT96p4/U8ILVU9D-mI/AAAAAAAAGeI/rP_RJccbhj8/s1600/hot+dry+sun+az.jpg" alt="hot+dry+sun+az" width="200" />

Just to demonstrate, I have tested this out and repeated pressing of the Replace All button works. First, press the Regular expression radio button, then:
Find what: (alt=\"[^+"]*?)\+([^\"]*?")
Replace with: \1 \2
This is of course not fool-proof, but it should work as long as you have no pathological data.
NOTE: My first version had a bug in that it would change alt="hot+dry+sun+az" width="200+200" to alt="hot+dry+sun+az" width="200 200", which is a good example of why one should not use regex to process HTML. I think this task can probably be done in a few lines of JavaScript with zero danger of getting tripped up as I did above, but that's another question for another day!
NOTE 2: My second version also got Zalgoed.

Related

How can I search in Visual Studio Code/IDE for a specific term which does not include another term?

I would like to know if it's possible to search in my code for a term ("<img") which in the same line of code it does not contain another term ("alt=").
In other words, I would like to know if it's possible to search for all image tags that does not have the alt attribute in it and how to do it.
For example, I need to find these:
<img src="some-image.png" />
But in this search I need to ignore those which already have the attribute:
<img src="another-image.png" alt="Some text" />
Is this possible on Visual Studio Code? If not, is it possible on the Visual Studio (IDE)?
In the VS Code search box (ctrl + f), or in the search sidebar, there are 3 icons in the search text box. The first is Match Case, second is Match Whole Word, and the third is Use Regular Expression. That is the one you want to select. After selecting that, paste this regex into the search text box.
^(?!.*?alt=).*img\ssrc.*
You can search the document using Ctrl + f or search everywhere with the sidebar search tool and it will only match image tags without the alt attribute.
I saw there is a right answer but it took me a long time to figure the right regex for this :) so I thought it might help someone.
you have to press CTRL + F5 , then choose the use regular expression option, then paste
^(?!.*\b(alt)\b)(?=.*\b(img)\b).*$
the first group rejects the word alt wherever it appears and the second group searches for the word img.
hope this could help!
Here is a regex for a couple more test cases: (1) img tag has an alt attribute before the src attribute - shouldn't be found I assume and (2) two or more img tags on the same line. These cases may or not be of concern.
Find: (<img\s)(?![^>]*alt=)([^>]*>)
regex101 demo

Replacing stuff of HTML using regex

I am editing a couple of hundred HTML files and I have to replace all the stuff manually, so I was wondering whether it could be done using regex.I don't think it is possible, but it might be, so please help me out.
Okay, so for example, I have many <p> tags in a file, each with a different class. eg:
<p class="class1">stuff here</p>
<p class="class2">more stuff here</p>
I wanted to replace the "stuff here" and "more stuff here" with something, for example
<p class="class1">[content]</p>
<p class="class2">[content]</p> .
I wanted to know if that is possible.
I'm using notepad++.
P.S. I'm new to regex.
I think notepad++ is great for stuff like this. Open up Find/Replace, and check the regular expressions box in the dialog's Search Mode section.
In the "Find what" field, try this:
\<p\ class\=(.*)\>(.*)\<\/p\>
and in "Replace with":
\<p\ class\=\1\>[content]\<\/p\>
the \1 here will take whatever (found by (.*)) between the class= and the angle bracket > which ends the tag, and replace it with itself, which essentially results in ignoring the class name, rather than having to specify. the second (.*) catches the current content inside the paragraph tag, which is what you want to replace. So where I wrote [content] in the "Replace with" block, that's where you'd put your new content. This does limit you to content that you can paste into the notepad++ find/replace dialog, but I think it has a pretty huge limit.
If I'm remembering that text field's limitations incorrectly, another thing you could do is just adjust my "Replace with" text to just replace the old text with some newlines:
\<p\ class\=\1\>\n\n\<\/p\>
This will delete the old text and leave a clear line where it once was, making it easy to paste whatever you want into the normal editor pane.
The first way is probably better, if your new content will fit the Replace With field, because this regex works once per line. And you can click "Replace" a couple times, and if it's working, clicking "Replace all" will iterate through every <p> element in the file.
Note: this solution assumes that your <p> tags open and close within one line, as you typed them your question description. If they break lines, you're going to want to enable . matches newline in the Replace dialog, and... you need trickier (more precise) syntax than (.*) to catch your class name and content-to-be-replaced. Let me know if this is the the case, and I'll fiddle with it and see if I can help more. The (.*) needs to change to (.*?) or something; the search needs to get more greedy, because if . matches newline, then .* matches any and every possible character infinite times, i.e., the whole document.

Remove everything after first space occurs in sublime text 3

does anyone know how to remove everything after the first space occurs in sublime text 3.
For example i have this file:
abcde fghi jklm
And i would like to have this output:
abcde
After searching a bit this is what worked for me:
In regular expression mode, search for:
\t.*
And replace with
Nothing
And a few years later...
Tested on macOS:
Use ⌥+⌘+F to popup the Find/Replace dialog.
Make sure the .* button is on (this enables the Regex mode).
Find: \ .*
Replace: don't type anything here
Obs: the \ in the regex is not strictly necessary, I just added it to emphasize the fact that there is a space in the string before the dot.
Obs2: the answer from the OP works only if you have tabs instead of spaces.
Press CtrlH to replace with regex, and use:
Find What: ^(\S+)\s?[^\n]*\n
Replace With: \1

Regular Expression for HTML attributes

I need to write a regular expression to catch the following things in bold
class="something_A211"
style="width:380px;margin-top: 20px;"
I have no idea how to write it, can someone help me?
I need this because, in html file i have to replace (whit notepad++) with empty, so i want to have a clear < tr > or < td > or anything else.
Thank you
You can use a regex like this to capture the content:
((?:class|style)=".*?")
Working demo
However, if you just want to match and delete that you can get rid of capturing groups:
(?:class|style)=".*?"
For all constructions like something="data", you can use this.
[^\s]*?\=\".*?\"
https://regex101.com/r/oQ5dR0/1
The link shows you what everything does.
To explain it briefly, a non space character can come before the "=" any mumber of times, then comes the quotes and info inside of them.
The question mark in .*? (and character any number of times) is needed so only the minimum amount of characters will be used (instead of looking for the next possible quotes somewhere further along)

RegEx to substitute tag names, leaving the content and attributes intact

I would like to replace opening and closing tag, leaving the content of tags and its attribute intact.
Here is what I have:
<div class="QText">Text to be kept</div>
to be replaced with
<span class="QText">Text to be kept</span>
I tried this expression which finds all expressions I want but there seems to be no way to replace found expressions.
<div class="QText">(.*?)</div>
Thanks in advance.
I think #AmitJoki's answer will work well enough in certain circumstances, but if you only want to replace div elements when they have an attribute or a specific set of attributes, then you would want to use a regex replacement with backreferences - how you specify and refer to a backreference, unfortunately, depends upon your chosen editor. Visual Studio has the most unique and annoying "flavor" of regex I know of, while Dreamweaver has a fairly typical implementation (both as well as I imagine whatever editor you're using do regex replacement - you just have to know the menu item or keystroke to bring up the dialog).
If memory serves, Dreamweaver has replacement options when you hit Ctrl+F, while you have to hit Ctrl+H, so try those.
Once you get a "Find" and "Replace" box, you would put something like what you have in your last example above: <div class="QText">(.*?)</div> or perhaps <div class="(QText|RText|SText)">(.*?)</div> into your "Find" box, then put something like <span class="QText">\1</span> or <span class="\1">\2</span> in the "Replacement" box. A few utilities might use $1 to refer to a backreference rather than \1, but you'll have to lookup help or experiment to be sure.
If you are using a language to run this expression, you need to tell us which language.
If you are using a specific editor to run this expression, you need to tell us which editor.
...and never forget the prevailing wisdom on regex and HTML
Just replace div.
var s="<div class='QText'>Text to be kept</div>";
alert(s.replace(/div/g,"span"));
Demo: http://jsfiddle.net/9sgvP/
Mark it as answer if it helps ;)
Posted as requested
If its going to be literal like that, capture what's to be kept, then replace the rest,
Find: <div( class="QText">.*?</)div>
Replace: <span$1span>