Cleaning up text: from ALLCAPS to <em>allcaps</em> - html

I need to clean up some text for html that used ALLCAPS instead of italics. So I'd like to take something that looks like this:
Here is an artificial EXAMPLE of a piece of TEXT that
uses allcaps as a way of EMPHASIZING words.
And convert it into this:
Here is an artificial <em>example</em> of a piece of <em>text</em> that
uses allcaps as a way of <em>emphasizing</em> words.
I'm tagging this with regex and notepad++, but (as you can probably tell) I don't know the first thing about how to use them.

There're no such possibilities with Notepad++ regex engine.
You can run a script that do the job, in Perl for example:
perl -pi.back -e "s#\b([A-Z]+)\b#'<em>'.lc($1).'</em>'/eg" yourfile.html
yourfile.html will be saved in yourfile.html.back

As far as I konw the regex engine of Notepad++ is not advanced enough to do this.
I would advice to use a programming language to accomplish this, in PHP for example you could do this:
echo preg_replace_callback('/([A-Z]{2,})/', create_function('$s', 'return "<em>".strtolower($s[0])."</em>";'), $s);
Be sure to exclude the legitim first capital letter of a single word in the regex.

AFAIK you cannot change casing in the Find\Replace mechanism of Notepad++.
If all you need is the <em> tag insertion you can do the following:
In the Find box type (\s+)([A-Z]+)(\s+), abd in the Replace type \1<em>\2</em>\3.
You can try some of the TextFX tools maybe in the TextFX Characters sub-menu.

Here is how to do this using JavaScript's string replace method:
var capfix = function (x) {
var emout = function (y) {
y = y.charAt(0) + "<em>" + y.toLowerCase() + "</em>" + y.charAt(y.length - 1);
};
return x.replace(/\s[A-Z]\s/g, emout);
};
To execute just call:
capfix(yourData);
This assumes that "yourData" is just a variable that represents your data as a string. If you wanted to use a web tool then "yourData" could represent the value from some input control, as in the following:
var yourData = document.getElementById("myinput").value;
alert(capfix(yourData));
To make that work just put an id attribute on your web tool input such as:
<textarea id="myinput"></textarea>

Related

Regex for different pair of html tags

I need regex matching every pair of <p>...<br> and <p CLASS='extmsg' >...<br> to distinguish parts of chat conversation, which I receive as string in following format:
<p CLASS='extmsg'>16:30:24 ~ customer#home.com: hello<br>
<p>16:30:14 ~ consultant#company.com: hello to you<br>
<p CLASS='extmsg'>16:30:03 ~ sam.i.am#greeneggs.ham: how are you<br>
<p>03/06/2018 16:29:55 ~ bok.kier#ccc.pl: im fine<br>
I need it for parsing method.
Don't parse HTML with regex, use a proper XML/HTML parser.
theory :
According to the compiling theory, HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.
realLife©®™ everyday tool in a shell :
You can use one of the following :
xmllint
xmlstarlet
saxon-lint (my own project)
Check: Using regular expressions with HTML tags
Example :
xmllint --html --xpath '//p[#CLASS="extmsg"]/text()' file
Regexes are not suitable for this, as per Giles Quenot's answer. Using a proper parser is a much better way to do this. If you do receive messages in the format shown:
One message per line
Every message starts with "<p"
Every message ends with "<br>"
an easier idea might be string-matching the start of the line in stead. I don't know what language you're using, but an example in javascript might be:
var inputString = "" // From wherever you get your data
var lines = inputString.split("\n")
for (i = 0; i < lines.length; i++) {
var line = lines[i]
if (line.indexOf("<p CLASS='extmsg'>") == 0) {
console.log("Customer just said: " + line)
} else {
console.log("Representative just said: " + line)
}
}
You can trim the <p> and <br> tags out too, as you already know how long they are.
NOTE This will break if the format of the data changes (e.g. a designer gets into the CSS file and starts using BEM notation, changing extmsg to message--external, and adding message--internal to the rep's messages). As it would if you used a regex or a parser. The best way to deal with this would be to get whoever is supplying the data to make you a proper API for this info.

How to get Emmet recognize text replacement or aliases in class names?

For example: ".row>.small-9-centered.small-3-centered" by writing " from writting ".row>div.s9c.s3c". I am trying to tell emmet to replace "s9c" for "small-9-centered".
Any suggestions? If it is not possible, any way you know to script a plugin to do this by detecting the next of . "dot" and before another dot or >.
I was looking for a text replacement, or something like var replacement in sublime but I did find anything, even if I would, maybe It wouldn't be recognized by the Emmet parser, that is why I was looking something like Snippets abbreviations but only for text, since emmet snippets always assumes it is a tag and adds closing tags.
Thank you in advance,
You can do it by using this line of javascript str.replace("s9c", "small-9-centered");, it is looking for a string and replacing it with another. See fiddle: http://jsfiddle.net/JchE5/448/
javascript:
function myFunction() {
var str = document.getElementById("change").innerHTML;
var res = str.replace("s9c", "small-9-centered");
document.getElementById("change").innerHTML = res;
}

Search and replace a variable word

I'm looking for a program which can create a text block with a different variable string every time.
I've tried doing this in certain languages, but I'd rather have a text editor which can do this.
Example: A list of words are chosen to replace a variable in a piece of text, that piece of text is then reprinted for every word.
I like Ice cream.
Ice cream is great.
Don't eat too much Ice cream.
I like Banana.
Banana is great.
Don't eat too much Banana.
I like Apple.
Apple is great.
Don't eat too much Apple.
I tried doing this in a programming language (AS3) but it doesn`t support multi-line strings very well.
What I`m looking for is either a text editor program (for Windows) which can do this, or a AS3 code snippet which can do this. (Which supports multi-line without the need of manually having to put \n everywhere.)
Not sure what to suggest for the multi-line issue - that's just how it is and you have to add \n or <br /> (in HTML text boxes).
As for the replace, that's a straightforward process. Just set up some type of token that you can replace in the text, e.g.
var str:String = "I like {}.\n{} is great.\nDon't eat too much {}.";
Then you can do either:
str.split("{}").join("Banana");
Or:
str.replace(/\{\}/g, "Banana");
The String Class has three handy methods for working with patterns and strings. These three methods are also case sensitive, meaning uppercase and lowercase matter when searching.
match()
search()
replace()
var string1:String = "Hello World!";
var subString:String = "Hell";
trace(string1.match(subString));
trace(string1.search(subString));
trace(string1.replace(subString, "Jell"));
match() method will display the substring if it is found and null if not found. The search() method will give a value of zero( 0 ) if the method finds the substing, and a value of negative one ( -1) if not found. The replace() method will replace the target substring with a new substring if the substring is found. You can also make the value nothing to simply remove an unwanted part of a string.
we can run conditionals like this:
var string1:String = "Hello World!";
var subString:String = "Hell";
if (string1.search(subString) == 0) {
trace(subString + " is in the string, I can now replace it or remove it.");
} else {
trace(subString + " is not in this string.");
}

Selenium: test if element contains some text

With Selenium IDE, how can I test if an element's inner text contains a specific string? For example:
<p id="fred">abcde</p>
'id=fred' contains "bcd" = true)
The Selenium-IDE documentation is helpful in this situation.
The command you are looking for is assertText, the locator would be id=fred and the text for example *bcd*.
It can be done with a simple wildcard:
verifyText
id="fred"
*bcd*
See selenium IDE Doc
You can also use:
assertElementPresent
css=p#fred:contains('bcd')
A solution with XPath:
Command: verify element present
Target: xpath=//div[#id='fred' and contains(.,'bcd')]
Are you able to use jQuery if so try something like
$("p#fred:contains('bcd')").css("text-decoration", "underline");
It seems regular expressions might work:
"The simplest character set is a character. The regular expression "the" contains three
character sets: "t," "h" and "e". It will match any line with the string "the" inside it.
This would also match the word "other". "
(From site: http://www.grymoire.com/Unix/Regular.html)
If you are using visual studio there is functionality for evaluating strings with regular expressions of ALL kinds (not just contains):
using System.Text.RegularExpressions;
Regex.IsMatch("YourInnerText", #"^[a-zA-Z]+$");
The expression I posted will check if the string contains ONLY letters.
Your regular expression would then according to my link be "bcd" or some string you construct at runtime. Or:
Regex.IsMatch("YourInnerText", #"bcd");
(Something like that anyway)
Hope it helped.
You can use the command assertTextPresent or verifyText

Qt Regex matches HTML Tag InnerText

I have a html file with one <pre>...</pre> tag. What regex is necessary to match all content within the pre's?
QString pattern = "<pre>(.*)</pre>";
QRegExp rx(pattern);
rx.setCaseSensitivity(cs);
int pos = 0;
QStringList list;
while ((pos = rx.indexIn(clipBoardData, pos)) != -1) {
list << rx.cap(1);
pos += rx.matchedLength();
}
list.count() is always 0
HTML is not a regular language, you do not use regular expressions to parse it.
Instead, use QXmlSimpleReader to load the XML, then QXmlQuery to find the PRE node and then extract its contents.
DO NOT PARSE HTML USING Regular Expressions!
Instead, use a real HTML parser, such as this one
i did it using substrings:
int begin = clipBoardData.indexOf("<pre");
int end = clipBoardData.indexOf("</body>");
QString result = data.mid(begin, end-begin);
The result includes the <pre's> but i found out thats even better ;)
I have to agree with the others. Drupal 6.x and older are using regex to do a lot of work on the HTML data. It quickly breaks if you create pages of 64Kb or more. So using a DOM or just indexOf() as you've done is a better much faster solution.
Now, for those interested in knowing more about regex, Qt uses the perl implementation. This means you can use the lazy operator. Your regex would become:
(<pre>.*?</pre>)+
to get each one of the <pre> block in your code (although if you have only one, then the question mark and the plus are not required.) Note that no delimiters at the start and end of the regular expression are required here.
QRegExp re("(<pre>.*?</pre>)+", Qt::CaseInsensitive);
re.indexIn(html_input);
QStringList list = re.capturedTexts();
Now list should have one <pre> tag or more.