Find the word and replace with html tag using regex - html

I have a text equation like: 10x^2-8y^2-7k^4=0.
How can I find the ^ and replace it with <sup>2</sup> in the whole string using regex. The result should be like:
I tried str = str.replace(/\^\s/g, "<sup>$1</sup> ") but I’m not getting the expected result.
Any ideas that can help to solve my problem?

I think you're looking for something like
\^(\d+)
It matches the ^, captures the exponent and replace with
<sup>$1</sup>
See it here at regex101.
Edit:
To meet your new demands, check this fiddle. It handles the sub as well using replace with a function.

Your current pattern matches a caret followed by a space character (space, tab, new-line, etc.), but you want to match a caret followed by a single character or multiple characters wrapped in accolades, as your string is in TeX.
/\^(?:([\w\d])|\{([\w\d]{2,})\})/g
Now, using str = str.replace(/\^(?:([\w\d])|\{([\w\d]{2,})\})/g, "<sup>$1</sup>"); should do the job.
You can make a more generic function from this expression that can wrap characters prefixed by a specific character with a specific tag.
function wrapPrefixed(string, prefix, tagName) {
return string.replace(new RegExp("\\" + prefix + "(?:([\\w\\d])|\\{([\\w\\d]{2,})\\})"), "<" + tagname + ">$1</" + tagname + ">");
}
For instance, calling wrapPrefixed("1_2 + 4_{3+2}", "_", "sub"); results in 1<sub>2</sub> + 4<sub>3+2</sub>.

Related

How to write regex expression for this type of text?

I'm trying to extract the price from the following HTML.
<td>$75.00/<span class='small font-weight-bold text-
danger'>Piece</span></small> *some more text here* </td>
What is the regex expression to get the number 75.00?
Is it something like:
<td>$*/<span class='small font-weight-bold text-danger'>
The dollar sign is a special character in regex, so you need to escape it with a backslash. Also, you only want to capture digits, so you should use character classes.
<td>\$(\d+[.]\d\d)<span
As the other respondent mentioned, regex changes a bit with each implementing language, so you may have to make some adjustments, but this should get you started.
I think you can go with /[0-9]+\.[0-9]+/.
[0-9] matches a single number. In this example you should get the number 7.
The + afterwards just says that it should look for more then just one number. So [0-9]+ will match with 75. It stops there because the character after 5 is a period.
Said so we will add a period to the regex and make sure it's escaped. A period usually means "every character". By escaping it will just look for a period. So we have /[0-9]+\./ so far.
Next we just to add [0-9]+ so it will find the other number(s) too.
It's important that you don't give it the global-flag like this /[0-9]+\.[0-9]+/g. Unless you want it to find more then just the first number/period-combination.
There is another regex you can use. It uses the parentheses to group the part you're looking for like this: /<td>\$(.+)<span/
It will match everything from <td>$ up to <span. From there you can filter out the group/part you're looking for. See the examples below.
// JavaScript
const text = "<td>$something<span class='small font-weight..."
const regex = /<td>\$(.+)<span/g
const match = regex.exec(text) // this will return an Array
console.log( match[1] ) // prints out "something"
// python
text = "<td>$something<span class='small font-weight..."
regex = re.compile(r"<td>\$(.+)<span")
print( regex.search(text).group(1) ) // prints out "something"
As an alternative you could use a DOMParser.
Wrap your <td> inside a table, use for example querySelector to get your element and get the first node from the childNodes.
That would give you $75.00/.
To remove the $ and the trailing forward slash you could use slice or use a regex like \$(\d+\.\d+) and get the value from capture group 1.
let html = `<table><tr><td>$75.00/<span class='small font-weight-bold text-
danger'>Piece</span></small> *some more text here* </td></tr></table>`;
let parser = new DOMParser();
let doc = parser.parseFromString(html, "text/html");
let result = doc.querySelector("td");
let textContent = result.childNodes.item(0).nodeValue;
console.log(textContent.slice(1, -1));
console.log(textContent.match(/\$(\d+\.\d+)/)[1]);

Add a "/" between each word in a String

I've got this string :
var str:String = mySharedObject.data.theDate;
where mySharedObject.data.theDate has some words (not always the same words has it depends on which button the user clicked).
So mySharedObject.data.theDate = "words words words".
Is it possible to add a "/" between each word ? (without knowing which words are in mySharedObject.data.theDate).
In order to have:
mySharedObject.data.theDate = "words/words/words".
Edit : You can replace " " with "/" in your string, this will split string with " " separator and then join with "/"
mySharedObject.data.theDate= mySharedObject.data.theDate.split(" ").join("/")
You can also do that using String.replace() with a little regular expression which will replace all spaces (notice here the g (global) flag to replace all instances), like this :
var s:String = 'word word word';
trace(s.replace(/\s/g, '/')); // gives : word/word/word
And for more about regular expressions take a look here.
Hope that can help.

Find word that starts at a newline

I have a simple loop to delete all words from the end of a text that start with a # and space.
AS3:
// messageText is usually taken from a users input field - therefore the newline is not present in the "messageText"
var messageText = "hello world #foo lorem ipsum #findme"
while (messageText.lastIndexOf(" ") == messageText.lastIndexOf(" #")){
messageText = messageText.slice(0,messageText.lastIndexOf(" "));
}
How to check if the position before the # is not a space but a newline?
I tried this but nothing gets found:
while (messageText.lastIndexOf(" ") == messageText.lastIndexOf("\n#")){
messageText = messageText.slice(0,messageText.lastIndexOf(" "));
}
\n is the newline character in the Unix file definition.
\r\n is the Windows version.
\r is the OSX version.
See also: this previous (dupe) post.
First thing is I'd manually try replacing "\n" with "\r\n" and then "\r" to see if there is some other newline in use. If so, then you just need a better search term that will match each version in one go.
A better solution might be to use Regular Expression (RegExp). You are explicitly looking for the newline character and a space after it. You could use this regex pattern to look for the start of a line with a single space:
var pattern:RegExp = /^\s/;
if (yourString.search(pattern) >= 0) { ... }
The ^ carat character enforces that it's the start of a line. The \s is a placeholder for any whitespace character, so if you don't want to match tabs then change it to a blank space. (I'm not familiar with ActionScript specifically, but that syntax looks OK and search() will return -1 if the pattern isn't found).

Convert QString to text with substitutes for HTML special characters (e.g. tags)

The user will be able to put in some text into a QLineEdit in a Qt environment. However, these input texts can contain HTML special characters. My aim is to convert this text by replacing all HTML special character occurences with substitutes.
A similar case is found in PHP with the htmlspecialchars() function http://php.net/manual/en/function.htmlspecialchars.php.
The main reason I want to do this is because I want to display the user input in a richtext QTextEdit and I don't want the user to be able to change HTML and I wish to be able to use HTML special characters without too much hassle.
How can this be achieved?
The easiest way I know, is to use QTextEdit::toHtml:
QString convert();
{
QString s = lineEdit->text();
QTextEdit textEdit;
textEdit.setPlainText(s);
QString ret = textEdit.toHtml();
int firstClosingTag = ret.indexOf("</p></body></html>");
int lastOpeningTag = ret.lastIndexOf(">", firstClosingTag);
return ret.mid(lastOpeningTag + 1, firstClosingTag - lastOpeningTag - 1);
}
There are also two functions, which you could find useful:
Qt::convertFromPlainText() and Qt::escape()
In Qt5, it's QString::toHtmlEscaped, e.g.:
QString a = "Hello, <span class=\"name\">Bear</span>!";
// a will contain: Hello, <span class="name">Bear</span>!
QString b = a.toHtmlEscaped();
// b will contain: Hello, <span class="name">Bear</span>!
This is direct equivalent of the htmlspecialchars in PHP. It replaces the Qt::escape function (mentioned by Amartel), which does the same thing but is now obsolete.
The Qt::convertFromPlainText function (also mentioned by Amartel) still exists in Qt 5, but it does more than PHP's htmlspecialchars. Not only it replaces < with <, > with >, & with &, " with " but also does additional handling of whitespace characters (space, tab, line feed, etc) to make the generated HTML look visually similarly to the original plain text. Particularly, it may put <p>…</p>/<br> for linefeeds, non-breaking spaces for spaces and multiple non-breaking spaces for tabs. I.e. this function is not just htmlspecialchars, it's even more comprehensive than nl2br(htmlspecialchars($s)) combination.
Note that unlike the PHP's htmlspecialchars with ENT_QUOTES, none of the Qt functions listed in this answer replace single quote (') with &apos;/'. So, for example, QString html = "<img alt='" + s.toHtmlEscaped() + "'>"; won't be safe, only QString html = "<img alt=\"" + s.toHtmlEscaped() + "\">"; will. (However, as < is replaced and ' has no special meaning outside <…>, something like QString html = "<b>" + s.toHtmlEscaped() + "</b>"; would also be safe.)

How to remove more than one whitespace character from HTML?

I want to remove extra whitespace which is coming from the user end, but I can't predict the format of the HTML.
For example:
<p> It's interesting that you would try cfsetting, since nothing in it's
documentation would indicate that it would do what you are asking.
Unless of course you were mis-reading what "enableCFoutputOnly" is
supposed to do.
</p>
<p>
It's interesting that you would try cfsetting, since nothing in it's
documentation would indicate that it would do what you are asking.
Unless of course you were mis-reading what "enableCFoutputOnly" is
supposed to do.</p>
Please guide me on how to remove more than one whitespace character from HTML.
You could use regex to replace any cases of multiple whitespace characters with a single space by looping over the result until no more multiple whitespace occurances exist:
lastTry = "<p> lots of space </p>";
nextTry = rereplace(lastTry,"\s\s", " ", "all");
while(nextTry != lastTry) {
lastTry = nextTry;
nextTry = REReplace(lastTry,"\s\s", " ", "all");
}
Tested working in CF10.
if you don't want to do it thru code out of total lazyness
=> http://jsbeautifier.org/
if you want to do it by code then a regex would be another option
This should do it:
<cfscript>
string function stripCRLFAndMultipleSpaces(required string theString) {
local.result = trim(rereplace(trim(arguments.theString), "([#Chr(09)#-#Chr(30)#])", " ", "all"));
local.result = trim(rereplace(local.result, "\s{2,}", " ", "all"));
return local.result;
}
</cfscript>