Typing is messy if I use html with QTextEdit - html

I'm trying to change the attributes of single words such font and color. QTextEdit allows me to set the text as html via setHtml(htmlText), after setting QString as html, typing becomes messy. I can't type spaces nor hit enter. Sometimes words are written backward.
void MainWindow::on_textEdit_textChanged()
{
QString plainText = ui->textEdit->toPlainText();
QString htmlText = "<font color='red'>" + plainText + "</font>";
disconnect(ui->textEdit, SIGNAL(textChanged()), this, SLOT(on_textEdit_textChanged()));
ui->textEdit->setHtml(htmlText);
QTextCursor cursor(ui->textEdit->textCursor());
cursor.movePosition(QTextCursor::EndOfWord);
ui->textEdit->setTextCursor(cursor);
connect(ui->textEdit, SIGNAL(textChanged()), this, SLOT(on_textEdit_textChanged()));
}
The color is set correctly but typing is inconsistent. I'm not expert in html. Any suggestions.

HTML is a transfer representation for the syntax tree of the document. You need to be modifying one or the other, otherwise you'll face the fallout from interactions between the two. Choose one and stick to it.
Since you're using the QTextDocument interface, you should be making all changes using that interface. There's no need to deal with HTML directly then. To change attributes of a chunk of text, select the text, then manipulate it via the cursor API.

Related

Html Encoding Texts in Razor Views

Texts and/or markups are rendered to output as-is without any html-encoding as we already expect.
For the following, the plain text with markup must be html-encoded.(We don't care about the code output here.)
#{ var theVar = "xyz"; }
some text & other text >>#theVar
So, the html in the output;
some text & other text >>xx
So, when we want to write some static text that needs to be html-encoded we have to use constructs like;
#{ var theVar = "xyz"; }
#("some text & other text >>")#theVar
to get the following html in the output;
some text & other text >>xyz
and for clarity when viewed in browser;
some text & other text >>xyz
So, is there a simple way of doing this? Some shortcut to html encode texts instead of using #("...") for each text which will start to look nasty when there are multiples of them.
What would be the best practice? How do you do this?
So, it is not a big concern when we specify utf-8 encoding for the document. It is not required to html encode characters as entity references except special(<, >, &, ", ') characters when utf-8 encoding used for the document.
Even using & by itself is not wrong for lenient browsers but there would be ambigous cases to consider like volt&amp. So, it would be better to html encode all of these special characters.
Check the W3 Consortium articles "When to use escapes" section;
http://www.w3.org/International/questions/qa-escapes#use

How to render parentheses as part of url in gtk label?

I am using gtk in an application and I make use of the abilities of gtklabel text to be rendered automatically as a clickable url. This works well most of the time, however with a url which contains parentheses "(" and ")" this does not work. The versions I use are the ones available on debian (old)stable, i.e. debian 6 (2.20) and 7 (3.4.2).
For example, I am trying to display the following url:
https://maps.google.com/maps?q=62.1891,+-141.5372+(Example+text+in+here+will+be+rendered+in+the+maps+label)&iwloc=A&hl=en
When I create a gtklabel with this text, for example:
text="<b>Click here for Map</b>\n"
Then it will display fine in the label as an underlined link in bold with the text Click here for Map
However when you click the link it will not show correctly and this error appears:
Gtk-WARNING **: Unable to show '(null)': Operation not supported
It looks like the parentheses mess up the rendering of the url by gtk.
Is there a way to escape the parentheses, or use a different character that works in the map url to create the label?
I have tried various methods of escaping it, however none were effective so far. Such as using %28 and %29 to replace the parentheses as well as backslashes as an escape character.
I am using the method described in https://developer.gnome.org/gtk2/2.24/GtkLabel.html and https://developer.gnome.org/gtk3/stable/GtkLabel.html under "Links" which allows automatic rendering of links:
Links
Since 2.18, GTK+ supports markup for clickable hyperlinks in addition
to regular Pango markup. The markup for links is borrowed from HTML,
using the a with href and title attributes. GTK+ renders links similar
to the way they appear in web browsers, with colored, underlined text.
The title attribute is displayed as a tooltip on the link. An example
looks like this:
1 gtk_label_set_markup (label, "Go to the http://www.gtk.org\" title=\"<i>Our&/i> website\">GTK+
website for more...");
I understand it is working in more recent releases of gtk (2.24 and 3.6), making sure to escape ampersands. But I was wondering if there is a work around for older gtk versions to avoid this problem?
You should be escaping your ampersands with &.
I'm pretty sure GTK prints out a runtime warning telling you this when you call gtk_label_set_markup().
Here's the warning on GTK 3.6.4:
Gtk-WARNING **: Failed to set text from markup due to error parsing markup: Error on line 1: Entity did not end with a semicolon; most likely you used an ampersand character without intending to start an entity - escape ampersand as &
jku is right, the ampersand need to be escaped. He're an example using the very same string as you, and it works (tested on 3.6.4 and 2.24.17).
#include <gtk/gtk.h>
int
main (int argc, char **argv)
{
gtk_init (&argc, &argv);
GtkWidget *window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
// This one won't work, needs ampersand escaping
// GtkWidget *label = gtk_label_new ("<b>Click here for Map</b>\n");
GtkWidget *label = gtk_label_new ("<b>Click here for Map</b>\n");
gtk_label_set_use_markup (GTK_LABEL (label), TRUE);
gtk_container_add (GTK_CONTAINER(window), label);
gtk_widget_show_all (GTK_WIDGET (window));
g_signal_connect (window, "destroy", G_CALLBACK(gtk_main_quit), NULL);
gtk_main ();
return 0;
}
Original answer:
Have you tried to call gtk_show_uri with that link? You could then see if that's a problem with what handles URI's, or if it's the way your label is formatted/constructed.

Extracting the first formatted line from some RTF/HTML text

OK, I painted myself into a corner on this one and haven't decided the way out yet.
My web application hosts a series of documents written by users, and edited with the CLEditor editor via PrimeFaces. The documents can be any size and have any formatting the user chooses.
What I want to do is treat the first line of the document as a title, so that when I create a listing of those documents I show only the title, then the user can click on that table row to see the whole document. I show the title with
<h:outputText value="#{backBean.doc}" escape="false" />
What I did is pull the substring of the document out up until but not including the first pattern of the br tag. That works unless the user applies formatting that spans past that. The resulting string has unclosed HTML tags usually div or span) and when they are output without escaping they interfere or even blank out the rest of the page.
So I am looking for an easy solution to fix the HTML fragment. I would rather not import a huge library such as JTidy because it pulls in all sorts of dependencies I don't have right now like a DOM parser, etc. Can anyone suggest a cheaper yet robust solution? Is there any way to clean this up on the client side?
I'd suggest Jsoup.
To parse the HTML and get its <body> content, it's a matter of this oneliner:
String htmlBody = Jsoup.parse(userInput).body().html();
By the way, since you seem to intend to redisplay user-controlled HTML unescaped, I strongly recommend to whitelist it to prevent XSS. E.g.
String safeHtmlBody = Jsoup.clean(htmlBody, Whitelist.basic());
This way you can safely redisplay it without worrying about a XSS attack hole:
<h:outputText value="#{bean.safeHtmlBody}" escape="false" />
See also:
What are the pros and cons of the leading Java HTML parsers?
How to implement a possibility for user to post some html-formatted data in a safe way?
CSRF, XSS and SQL Injection attack prevention in JSF
You should be escaping the partial contents of the document somehow, otherwise users can upload documents containing HTML/JavaScript code that will compromise your site. As you can see, even simple formatting can break it. One solution could be to remove all tags (via regex, string replace, etc) and then escape the title.
I figure out the JTidy way of doing it. This seems very heavy-handed to me but I'm going with it until something better is suggested. Also if someone else is in this situation it might be useful:
public class TitleRTF {
private static final Pattern pTidy = Pattern.compile("<body>(.*)</body>");
public TitleRTF() {}
public static String getTitle(String rtfSource) {
org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy();
tidy.setQuiet(true);
ByteArrayInputStream bais = new ByteArrayInputStream(rtfSource.getBytes());
org.w3c.dom.Document doc = tidy.parseDOM(new BufferedInputStream(bais), null);
try {
Transformer tr = TransformerFactory.newInstance().newTransformer();
StreamResult result = new StreamResult(new StringWriter());
NodeList list = doc.getElementsByTagName("body");
if (list.getLength() > 0) {
DOMSource source = new DOMSource(list.item(0));
tr.transform(source, result);
String text = result.getWriter().toString();
Matcher m = pTidy.matcher(text);
if (m.find()) return m.group(1);
}
} catch (TransformerException ex) { }
return "(not parsable)";
}
}
One thing that needs to be added to this is a way of keeping JTidy from logging what it sees as HTML errors. The setQuiet(true) doesn't seem to do it.

How can I keep track of a section of text in a Dynamic TextField in AS3

I want to be able to apply non-style attributes to sections of text in a TextField. For example characters 30-45 will be set to animate in a certain direction.
As this field is editable characters 30-45 may no longer be at 30-45 if the text is edited in any way.
Can anyone think of an elegant way to keep track of which characters had the attributes applied to them?
I've had a similar project and ended up extending the TextField class to fit my needs. Here's a short description of what's to do - my actual code is confidential, I'm afraid:
Override the setters for text and htmlText
Parse any content from these setters into an array of custom objects. Each of these objects contains raw text chunks and the metadata that applies to them (format, comments, etc.).
For example,
<span class="sometext" animation="true">Info</span>
would be translated to an object like this:
{ text:"Info", clazz="sometext", animation:true };
The actual text output is then rendered by using appendText to add chunk by chunk of the raw text and using setTextFormat to apply formatting (or do whatever else is necessary) after each append step.
Add event listeners to react on TEXT_INPUT and/or KEY_DOWN/KEY_UP events to catch any new user input. (You will replace the entire text content of your TextField over and over again, so it's not an option to use super.text.)
User input is processed by using selectionBeginIndex and selectionEndIndex (count the number of characters in the raw text of your object array to find out which chunks are affected). Add or replace the new text directly within the container objects, then use step 3. to refresh the entire text in the TextField.
I have also added a method that reduces the array before it is rendered (i.e. combine adjacent chunks with identical metadata). This keeps the array lean and helps creating XML output that does not have a complicated tree structure (one-dimensional is quite what we like for this kind of scenario).
Override the getters for text and htmlText to return the newly formatted info, if you need the results somewhere else. I've used htmlText to return a fully decorated xml string and kept text for accessing the raw text content, just like in a generic TextField.

How to stop an html TEXTAREA from decoding html entities

I have a strange problem:
In the database, I have a literal ampersand lt semicolon:
<div
whenever its printed into a html textarea tag, the source code of the page shows the > as >.
How do I stop this decoding?
You can't stop entities being decoded in a textarea since the content of a textarea is not (unlike a script or style element) intrinsic CDATA, even though error recovery may sometimes give the impression that it is.
The definition of the textarea element is:
<!ELEMENT TEXTAREA - - (#PCDATA) -- multi-line text field -->
i.e. it contains PCDATA which is described as:
Document text (indicated by the SGML construct "#PCDATA"). Text may contain character references. Recall that these begin with & and end with a semicolon (e.g., Hergé's adventures of Tintin contains the character entity reference for the e acute character).
This means that when you type (the invalid HTML of) "start of tag" (<) the browser corrects it to "less than sign" (<) but when you type "start of entity" (&), which is allowed, no error correction takes place.
You need to write what you mean. If you want to include some HTML as data then you must convert any character with special meaning to its respective character reference.
If the data is:
<div
Then the HTML must be:
<textarea>&lt;div</textarea>
You can use the standard functions for converting this (e.g. PHP's htmlspecialchars or Perl's HTML::Entities module).
NB 1: If you were using XHTML[2] (and really using it, it doesn't count if you serve it as text/html) then you could use an explicit CDATA block:
<textarea><![CDATA[<div]]></textarea>
NB 2: Or if browsers implemented HTML 4 correctly
Ok , but the question is . why it decodes them anyway ? assuming i've added & , save the textarea , ti will be saved < , but displayed as < , saving it again will convert it back to < (but it will remain < in the database) , saving again will save it a < in the database , why the textarea decodes it ?
The server sends (to the browser) data encoded as HTML.
The browser sends (to the server) data encoded as application/x-www-form-urlencoded (or multipart/form-data).
Since the browser is not sending the data as HTML, the characters are not represented as HTML entities.
If you take the data received from the client and then put it into an HTML document, then you must encode it as HTML first.
In PHP, this can be done using htmlentities(). Example below.
<?php
$content = "This string contains the TM symbol: ™";
print "<textarea>". htmlentities($content) ."</textarea>";
?>
Without htmlentities(), the textarea would interpret and display the TM symbol (™) instead of "™".
http://php.net/manual/en/function.htmlentities.php
You have to be sure that this is rendered to the browser:
<textarea name="somename">&lt;div</textarea>
Essentially, this means that the & in < has to be html encoded to &. How to do it will depend on the technologies you're using.
UPDATE: Think about it like this. If you want to display <div> inside a textarea, you'll have to encode <> because otherwise, <div> would be a normal HTML element to the browser:
<textarea name="somename"><div></textarea>
Having said this, if you want to display <div> inside a textarea, you'll have to encode & again, because the browser decodes HTML entities when rendering HTML. It has nothing to do with your database.
You can serve your DB-content from a separate page and then place it in the textarea using a Javascript (jQuery) Ajax-call:
request = $.ajax
({
type: "GET",
url: "url-with-the-troubled-content.php",
success: function(data)
{
document.getElementById('id-of-text-area').value = data;
}
});
Explained at
http://www.endtask.net/how-to-prevent-a-textarea-element-from-decoding-html-entities/
I had the same problem and I just made two replacements on the text to show from the database before letting it into the text area:
myString = Replace(myString, "&", "&")
myString = Replace(myString, "<", "<")
Replace n:o 1 to trick the textarea to show the codes.
replace n:o 2: Without this replacement you can not show the word "" inside the textarea (it would end the textarea tag).
(Asp / vbscript code above, translate to a replace method of your language choice)
I found an alternative solution for reading and working with in-browser, simply read the element's text() using jQuery, it returns the characters as display characters and allows me to write from a textarea to a div's innerHTML using the property via html()...
With only JS and HTML...
...to answer the actual question, with a bare-minimal example:
<textarea id=myta></textarea>
<script id=mytext type=text/plain>
™
</script>
<script> myta.value = mytext.innerText; </script>
Explanation:
Script tags do not render html nor entities. By storing text in a script tag, it will remain unadultered-- problem is it will try to execute as JavaScript. So we use an empty textarea and store the text in a script tag (here, the first one).
To prevent that, we change the mime-type to text/plain instead of it's default, which is text/javascript. This will prevent it from running.
Then to populate the textarea, we copy the script tag's content to it (here done in the second script tag).
The only caveats I have found with this are you have to use JavaScript and you cannot include script tags directly in it.