Convert QString to text with substitutes for HTML special characters (e.g. tags) - html

The user will be able to put in some text into a QLineEdit in a Qt environment. However, these input texts can contain HTML special characters. My aim is to convert this text by replacing all HTML special character occurences with substitutes.
A similar case is found in PHP with the htmlspecialchars() function http://php.net/manual/en/function.htmlspecialchars.php.
The main reason I want to do this is because I want to display the user input in a richtext QTextEdit and I don't want the user to be able to change HTML and I wish to be able to use HTML special characters without too much hassle.
How can this be achieved?

The easiest way I know, is to use QTextEdit::toHtml:
QString convert();
{
QString s = lineEdit->text();
QTextEdit textEdit;
textEdit.setPlainText(s);
QString ret = textEdit.toHtml();
int firstClosingTag = ret.indexOf("</p></body></html>");
int lastOpeningTag = ret.lastIndexOf(">", firstClosingTag);
return ret.mid(lastOpeningTag + 1, firstClosingTag - lastOpeningTag - 1);
}
There are also two functions, which you could find useful:
Qt::convertFromPlainText() and Qt::escape()

In Qt5, it's QString::toHtmlEscaped, e.g.:
QString a = "Hello, <span class=\"name\">Bear</span>!";
// a will contain: Hello, <span class="name">Bear</span>!
QString b = a.toHtmlEscaped();
// b will contain: Hello, <span class="name">Bear</span>!
This is direct equivalent of the htmlspecialchars in PHP. It replaces the Qt::escape function (mentioned by Amartel), which does the same thing but is now obsolete.
The Qt::convertFromPlainText function (also mentioned by Amartel) still exists in Qt 5, but it does more than PHP's htmlspecialchars. Not only it replaces < with <, > with >, & with &, " with " but also does additional handling of whitespace characters (space, tab, line feed, etc) to make the generated HTML look visually similarly to the original plain text. Particularly, it may put <p>…</p>/<br> for linefeeds, non-breaking spaces for spaces and multiple non-breaking spaces for tabs. I.e. this function is not just htmlspecialchars, it's even more comprehensive than nl2br(htmlspecialchars($s)) combination.
Note that unlike the PHP's htmlspecialchars with ENT_QUOTES, none of the Qt functions listed in this answer replace single quote (') with &apos;/'. So, for example, QString html = "<img alt='" + s.toHtmlEscaped() + "'>"; won't be safe, only QString html = "<img alt=\"" + s.toHtmlEscaped() + "\">"; will. (However, as < is replaced and ' has no special meaning outside <…>, something like QString html = "<b>" + s.toHtmlEscaped() + "</b>"; would also be safe.)

Related

Appending long snippets of html in jquery [duplicate]

I have the following code in Ruby. I want to convert this code into JavaScript. What is the equivalent code in JS?
text = <<"HERE"
This
Is
A
Multiline
String
HERE
Update:
ECMAScript 6 (ES6) introduces a new type of literal, namely template literals. They have many features, variable interpolation among others, but most importantly for this question, they can be multiline.
A template literal is delimited by backticks:
var html = `
<div>
<span>Some HTML here</span>
</div>
`;
(Note: I'm not advocating to use HTML in strings)
Browser support is OK, but you can use transpilers to be more compatible.
Original ES5 answer:
Javascript doesn't have a here-document syntax. You can escape the literal newline, however, which comes close:
"foo \
bar"
ES6 Update:
As the first answer mentions, with ES6/Babel, you can now create multi-line strings simply by using backticks:
const htmlString = `Say hello to
multi-line
strings!`;
Interpolating variables is a popular new feature that comes with back-tick delimited strings:
const htmlString = `${user.name} liked your post about strings`;
This just transpiles down to concatenation:
user.name + ' liked your post about strings'
Original ES5 answer:
Google's JavaScript style guide recommends to use string concatenation instead of escaping newlines:
Do not do this:
var myString = 'A rather long string of English text, an error message \
actually that just keeps going and going -- an error \
message to make the Energizer bunny blush (right through \
those Schwarzenegger shades)! Where was I? Oh yes, \
you\'ve got an error and all the extraneous whitespace is \
just gravy. Have a nice day.';
The whitespace at the beginning of each line can't be safely stripped at compile time; whitespace after the slash will result in tricky errors; and while most script engines support this, it is not part of ECMAScript.
Use string concatenation instead:
var myString = 'A rather long string of English text, an error message ' +
'actually that just keeps going and going -- an error ' +
'message to make the Energizer bunny blush (right through ' +
'those Schwarzenegger shades)! Where was I? Oh yes, ' +
'you\'ve got an error and all the extraneous whitespace is ' +
'just gravy. Have a nice day.';
the pattern text = <<"HERE" This Is A Multiline String HERE is not available in js (I remember using it much in my good old Perl days).
To keep oversight with complex or long multiline strings I sometimes use an array pattern:
var myString =
['<div id="someId">',
'some content<br />',
'someRefTxt',
'</div>'
].join('\n');
or the pattern anonymous already showed (escape newline), which can be an ugly block in your code:
var myString =
'<div id="someId"> \
some content<br /> \
someRefTxt \
</div>';
Here's another weird but working 'trick'1:
var myString = (function () {/*
<div id="someId">
some content<br />
someRefTxt
</div>
*/}).toString().match(/[^]*\/\*([^]*)\*\/\}$/)[1];
external edit: jsfiddle
ES20xx supports spanning strings over multiple lines using template strings:
let str = `This is a text
with multiple lines.
Escapes are interpreted,
\n is a newline.`;
let str = String.raw`This is a text
with multiple lines.
Escapes are not interpreted,
\n is not a newline.`;
1 Note: this will be lost after minifying/obfuscating your code
You can have multiline strings in pure JavaScript.
This method is based on the serialization of functions, which is defined to be implementation-dependent. It does work in the most browsers (see below), but there's no guarantee that it will still work in the future, so do not rely on it.
Using the following function:
function hereDoc(f) {
return f.toString().
replace(/^[^\/]+\/\*!?/, '').
replace(/\*\/[^\/]+$/, '');
}
You can have here-documents like this:
var tennysonQuote = hereDoc(function() {/*!
Theirs not to make reply,
Theirs not to reason why,
Theirs but to do and die
*/});
The method has successfully been tested in the following browsers (not mentioned = not tested):
IE 4 - 10
Opera 9.50 - 12 (not in 9-)
Safari 4 - 6 (not in 3-)
Chrome 1 - 45
Firefox 17 - 21 (not in 16-)
Rekonq 0.7.0 - 0.8.0
Not supported in Konqueror 4.7.4
Be careful with your minifier, though. It tends to remove comments. For the YUI compressor, a comment starting with /*! (like the one I used) will be preserved.
I think a real solution would be to use CoffeeScript.
ES6 UPDATE: You could use backtick instead of creating a function with a comment and running toString on the comment. The regex would need to be updated to only strip spaces. You could also have a string prototype method for doing this:
let foo = `
bar loves cake
baz loves beer
beer loves people
`.removeIndentation()
Someone should write this .removeIndentation string method... ;)
You can do this...
var string = 'This is\n' +
'a multiline\n' +
'string';
I came up with this very jimmy rigged method of a multi lined string. Since converting a function into a string also returns any comments inside the function you can use the comments as your string using a multilined comment /**/. You just have to trim off the ends and you have your string.
var myString = function(){/*
This is some
awesome multi-lined
string using a comment
inside a function
returned as a string.
Enjoy the jimmy rigged code.
*/}.toString().slice(14,-3)
alert(myString)
I'm surprised I didn't see this, because it works everywhere I've tested it and is very useful for e.g. templates:
<script type="bogus" id="multi">
My
multiline
string
</script>
<script>
alert($('#multi').html());
</script>
Does anybody know of an environment where there is HTML but it doesn't work?
I solved this by outputting a div, making it hidden, and calling the div id by jQuery when I needed it.
e.g.
<div id="UniqueID" style="display:none;">
Strings
On
Multiple
Lines
Here
</div>
Then when I need to get the string, I just use the following jQuery:
$('#UniqueID').html();
Which returns my text on multiple lines. If I call
alert($('#UniqueID').html());
I get:
There are multiple ways to achieve this
1. Slash concatenation
var MultiLine= '1\
2\
3\
4\
5\
6\
7\
8\
9';
2. regular concatenation
var MultiLine = '1'
+'2'
+'3'
+'4'
+'5';
3. Array Join concatenation
var MultiLine = [
'1',
'2',
'3',
'4',
'5'
].join('');
Performance wise, Slash concatenation (first one) is the fastest.
Refer this test case for more details regarding the performance
Update:
With the ES2015, we can take advantage of its Template strings feature. With it, we just need to use back-ticks for creating multi line strings
Example:
`<h1>{{title}}</h1>
<h2>{{hero.name}} details!</h2>
<div><label>id: </label>{{hero.id}}</div>
<div><label>name: </label>{{hero.name}}</div>
`
Using script tags:
add a <script>...</script> block containing your multiline text into head tag;
get your multiline text as is... (watch out for text encoding: UTF-8, ASCII)
<script>
// pure javascript
var text = document.getElementById("mySoapMessage").innerHTML ;
// using JQuery's document ready for safety
$(document).ready(function() {
var text = $("#mySoapMessage").html();
});
</script>
<script id="mySoapMessage" type="text/plain">
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:typ="...">
<soapenv:Header/>
<soapenv:Body>
<typ:getConvocadosElement>
...
</typ:getConvocadosElement>
</soapenv:Body>
</soapenv:Envelope>
<!-- this comment will be present on your string -->
//uh-oh, javascript comments... SOAP request will fail
</script>
I like this syntax and indendation:
string = 'my long string...\n'
+ 'continue here\n'
+ 'and here.';
(but actually can't be considered as multiline string)
Downvoters: This code is supplied for information only.
This has been tested in Fx 19 and Chrome 24 on Mac
DEMO
var new_comment; /*<<<EOF
<li class="photobooth-comment">
<span class="username">
You:
</span>
<span class="comment-text">
$text
</span>
#<span class="comment-time">
2d
</span> ago
</li>
EOF*/
// note the script tag here is hardcoded as the FIRST tag
new_comment=document.currentScript.innerHTML.split("EOF")[1];
document.querySelector("ul").innerHTML=new_comment.replace('$text','This is a dynamically created text');
<ul></ul>
A simple way to print multiline strings in JavaScript is by using template literals(template strings) denoted by backticks (` `). you can also use variables inside a template string-like (` name is ${value} `)
You can also
const value = `multiline`
const text = `This is a
${value}
string in js`;
console.log(text);
There's this library that makes it beautiful:
https://github.com/sindresorhus/multiline
Before
var str = '' +
'<!doctype html>' +
'<html>' +
' <body>' +
' <h1>❤ unicorns</h1>' +
' </body>' +
'</html>' +
'';
After
var str = multiline(function(){/*
<!doctype html>
<html>
<body>
<h1>❤ unicorns</h1>
</body>
</html>
*/});
Found a lot of over engineered answers here.
The two best answers in my opinion were:
1:
let str = `Multiline string.
foo.
bar.`
which eventually logs:
Multiline string.
foo.
bar.
2:
let str = `Multiline string.
foo.
bar.`
That logs it correctly but it's ugly in the script file if str is nested inside functions / objects etc...:
Multiline string.
foo.
bar.
My really simple answer with regex which logs the str correctly:
let str = `Multiline string.
foo.
bar.`.replace(/\n +/g, '\n');
Please note that it is not the perfect solution but it works if you are sure that after the new line (\n) at least one space will come (+ means at least one occurrence). It also will work with * (zero or more).
You can be more explicit and use {n,} which means at least n occurrences.
The equivalent in javascript is:
var text = `
This
Is
A
Multiline
String
`;
Here's the specification. See browser support at the bottom of this page. Here are some examples too.
This works in IE, Safari, Chrome and Firefox:
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js"></script>
<div class="crazy_idea" thorn_in_my_side='<table border="0">
<tr>
<td ><span class="mlayouttablecellsdynamic">PACKAGE price $65.00</span></td>
</tr>
</table>'></div>
<script type="text/javascript">
alert($(".crazy_idea").attr("thorn_in_my_side"));
</script>
to sum up, I have tried 2 approaches listed here in user javascript programming (Opera 11.01):
this one didn't work: Creating multiline strings in JavaScript
this worked fairly well, I have also figured out how to make it look good in Notepad++ source view: Creating multiline strings in JavaScript
So I recommend the working approach for Opera user JS users. Unlike what the author was saying:
It doesn't work on firefox or opera; only on IE, chrome and safari.
It DOES work in Opera 11. At least in user JS scripts. Too bad I can't comment on individual answers or upvote the answer, I'd do it immediately. If possible, someone with higher privileges please do it for me.
Exact
Ruby produce: "This\nIs\nA\nMultiline\nString\n" - below JS produce exact same string
text = `This
Is
A
Multiline
String
`
// TEST
console.log(JSON.stringify(text));
console.log(text);
This is improvement to Lonnie Best answer because new-line characters in his answer are not exactly the same positions as in ruby output
My extension to https://stackoverflow.com/a/15558082/80404.
It expects comment in a form /*! any multiline comment */ where symbol ! is used to prevent removing by minification (at least for YUI compressor)
Function.prototype.extractComment = function() {
var startComment = "/*!";
var endComment = "*/";
var str = this.toString();
var start = str.indexOf(startComment);
var end = str.lastIndexOf(endComment);
return str.slice(start + startComment.length, -(str.length - end));
};
Example:
var tmpl = function() { /*!
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav">
</ul>
</div>
*/}.extractComment();
Updated for 2015: it's six years later now: most people use a module loader, and the main module systems each have ways of loading templates. It's not inline, but the most common type of multiline string are templates, and templates should generally be kept out of JS anyway.
require.js: 'require text'.
Using require.js 'text' plugin, with a multiline template in template.html
var template = require('text!template.html')
NPM/browserify: the 'brfs' module
Browserify uses a 'brfs' module to load text files. This will actually build your template into your bundled HTML.
var fs = require("fs");
var template = fs.readFileSync(template.html', 'utf8');
Easy.
If you're willing to use the escaped newlines, they can be used nicely. It looks like a document with a page border.
Easiest way to make multiline strings in Javascrips is with the use of backticks ( `` ). This allows you to create multiline strings in which you can insert variables with ${variableName}.
Example:
let name = 'Willem';
let age = 26;
let multilineString = `
my name is: ${name}
my age is: ${age}
`;
console.log(multilineString);
compatibility :
It was introduces in ES6//es2015
It is now natively supported by all major browser vendors (except internet explorer)
Check exact compatibility in Mozilla docs here
The ES6 way of doing it would be by using template literals:
const str = `This
is
a
multiline text`;
console.log(str);
More reference here
You can use TypeScript (JavaScript SuperSet), it supports multiline strings, and transpiles back down to pure JavaScript without overhead:
var templates = {
myString: `this is
a multiline
string`
}
alert(templates.myString);
If you'd want to accomplish the same with plain JavaScript:
var templates =
{
myString: function(){/*
This is some
awesome multi-lined
string using a comment
inside a function
returned as a string.
Enjoy the jimmy rigged code.
*/}.toString().slice(14,-3)
}
alert(templates.myString)
Note that the iPad/Safari does not support 'functionName.toString()'
If you have a lot of legacy code, you can also use the plain JavaScript variant in TypeScript (for cleanup purposes):
interface externTemplates
{
myString:string;
}
declare var templates:externTemplates;
alert(templates.myString)
and you can use the multiline-string object from the plain JavaScript variant, where you put the templates into another file (which you can merge in the bundle).
You can try TypeScript at
http://www.typescriptlang.org/Playground
ES6 allows you to use a backtick to specify a string on multiple lines. It's called a Template Literal. Like this:
var multilineString = `One line of text
second line of text
third line of text
fourth line of text`;
Using the backtick works in NodeJS, and it's supported by Chrome, Firefox, Edge, Safari, and Opera.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
Also do note that, when extending string over multiple lines using forward backslash at end of each line, any extra characters (mostly spaces, tabs and comments added by mistake) after forward backslash will cause unexpected character error, which i took an hour to find out
var string = "line1\ // comment, space or tabs here raise error
line2";
Please for the love of the internet use string concatenation and opt not to use ES6 solutions for this. ES6 is NOT supported all across the board, much like CSS3 and certain browsers being slow to adapt to the CSS3 movement. Use plain ol' JavaScript, your end users will thank you.
Example:
var str = "This world is neither flat nor round. "+
"Once was lost will be found";
You can use tagged templates to make sure you get the desired output.
For example:
// Merging multiple whitespaces and trimming the output
const t = (strings) => { return strings.map((s) => s.replace(/\s+/g, ' ')).join("").trim() }
console.log(t`
This
Is
A
Multiline
String
`);
// Output: 'This Is A Multiline String'
// Similar but keeping whitespaces:
const tW = (strings) => { return strings.map((s) => s.replace(/\s+/g, '\n')).join("").trim() }
console.log(tW`
This
Is
A
Multiline
String
`);
// Output: 'This\nIs\nA\nMultiline\nString'
Multiline string with variables
var x = 1
string = string + `<label class="container">
<p>${x}</p>
</label>`;

How do I not escape HTML special characters in QDomText?

I'm trying to create an HTML document containing Javascript using Qt XML. Here is the relevant part of my Qt code:
QDomDocument document;
//Create head, body, etc
QDomElement script = document.createElement("script");
script.setAttribute("type", "text/javascript");
body.appendChild(script); //body is the QDomElement representing the <body> tag
QDomText scriptText = document.createTextNode("if(x < 0){\n/*do something*/\n}");
script.appendChild(scriptText);
QFile file("C:\\foo.html");
file.open(QIODevice::WriteOnly);
QTextStream stream(&file);
stream << document.toString();
The problem is that in the Javascript code, it's escaping the < character replacing it with <, giving the following output which isn't valid Javascript:
<script type="text/javascript">
if(x < 0){
/*do something*/
}
</script>
I've searched the Qt documentation for a solution, but haven't found anything.
A workaround could be to replace < with < when writing in the file by doing stream << document.toString().replace("<", "<"), but there might also be occurrences of < outside of the Javascript code that I want to leave alone.
I can also think of a few Javascript tricks to check if a number is negative without using any special HTML characters, like for example if(String(x).indexOf('-') != -1), but I would like to know if there is a better way of doing it.
My question is how do I create a QDomText object with text containing special HTML characters like <, >, &, etc without them being escaped in QDomDocument::toString()?
You can put the javascript code in a CDATA section:
QDomElement script = document.createElement("script");
script.setAttribute("type", "text/javascript");
body.appendChild(script);
QString js = "if(x < 0){\n/*do something*/\n}";
QDomCDATASection data = document.createCDATASection(js);
script.appendChild(data);
then remove the unwanted text right after:
QString text = document.toString();
text.replace("<![CDATA[", "\n");
text.replace("]]>", "\n");

Find the word and replace with html tag using regex

I have a text equation like: 10x^2-8y^2-7k^4=0.
How can I find the ^ and replace it with <sup>2</sup> in the whole string using regex. The result should be like:
I tried str = str.replace(/\^\s/g, "<sup>$1</sup> ") but I’m not getting the expected result.
Any ideas that can help to solve my problem?
I think you're looking for something like
\^(\d+)
It matches the ^, captures the exponent and replace with
<sup>$1</sup>
See it here at regex101.
Edit:
To meet your new demands, check this fiddle. It handles the sub as well using replace with a function.
Your current pattern matches a caret followed by a space character (space, tab, new-line, etc.), but you want to match a caret followed by a single character or multiple characters wrapped in accolades, as your string is in TeX.
/\^(?:([\w\d])|\{([\w\d]{2,})\})/g
Now, using str = str.replace(/\^(?:([\w\d])|\{([\w\d]{2,})\})/g, "<sup>$1</sup>"); should do the job.
You can make a more generic function from this expression that can wrap characters prefixed by a specific character with a specific tag.
function wrapPrefixed(string, prefix, tagName) {
return string.replace(new RegExp("\\" + prefix + "(?:([\\w\\d])|\\{([\\w\\d]{2,})\\})"), "<" + tagname + ">$1</" + tagname + ">");
}
For instance, calling wrapPrefixed("1_2 + 4_{3+2}", "_", "sub"); results in 1<sub>2</sub> + 4<sub>3+2</sub>.

How to remove more than one whitespace character from HTML?

I want to remove extra whitespace which is coming from the user end, but I can't predict the format of the HTML.
For example:
<p> It's interesting that you would try cfsetting, since nothing in it's
documentation would indicate that it would do what you are asking.
Unless of course you were mis-reading what "enableCFoutputOnly" is
supposed to do.
</p>
<p>
It's interesting that you would try cfsetting, since nothing in it's
documentation would indicate that it would do what you are asking.
Unless of course you were mis-reading what "enableCFoutputOnly" is
supposed to do.</p>
Please guide me on how to remove more than one whitespace character from HTML.
You could use regex to replace any cases of multiple whitespace characters with a single space by looping over the result until no more multiple whitespace occurances exist:
lastTry = "<p> lots of space </p>";
nextTry = rereplace(lastTry,"\s\s", " ", "all");
while(nextTry != lastTry) {
lastTry = nextTry;
nextTry = REReplace(lastTry,"\s\s", " ", "all");
}
Tested working in CF10.
if you don't want to do it thru code out of total lazyness
=> http://jsbeautifier.org/
if you want to do it by code then a regex would be another option
This should do it:
<cfscript>
string function stripCRLFAndMultipleSpaces(required string theString) {
local.result = trim(rereplace(trim(arguments.theString), "([#Chr(09)#-#Chr(30)#])", " ", "all"));
local.result = trim(rereplace(local.result, "\s{2,}", " ", "all"));
return local.result;
}
</cfscript>

Replace continuous space with single space and multiple "&nbsp" elements

I have one html document which contains whitespaces in some nodes. For example,
<B>This is Whitespace Node </B>
When this html is displayed in the browser, more than one continuous space in html is always displayed as one space. To avoid this issue, I want to replace the continuous spaces with a single space and multiple elements.
What is the best solution to achive this?
I am using C# 2005.
Try this,
string str = "<B>This is Whitespace Node </B>";
Regex rgx = new Regex("([\\S][ ])");
string result = rgx.Replace(str, "$1.")
.Replace(" .","?")
.Replace(" ","&nbsp")
.Replace("?"," ");
Use CSS's white-space property as per http://www.w3.org/TR/CSS2/text.html#white-space-prop
white-space: pre-wrap
Or, if you really want to do it with bruteforce, replace two consecutive spaces with a non-breaking-space and a normal space... I strongly recommend against this.
string text = originalText.Replace(" ", " ");
You can try
String.Replace(" ", " ")
if you prefer regex
Regex rgx = new Regex("([ \t]|&nsbp)+");
string result = rgx.Replace(input, " ");
I assume you are setting the value of the control from code behind? If so then ...
<strong><asp:Literal id="myLiteral">This is Whitespace Node </asp:Literal></strong>
And in code behind ...
var myText = "This is Whitespace Node ";
myLiteral.Text = myText.Replace(" ", " ");
If no code behind or not in a literal ...
<strong><%= "This is Whitespace Node ".Replace(" ", " ") %></strong>