Set TextView text from html-formatted string resource in XML - html

I have some fixed strings inside my strings.xml, something like:
<resources>
<string name="somestring">
<B>Title</B><BR/>
Content
</string>
</resources>
and in my layout I've got a TextView which I'd like to fill with the html-formatted string.
<TextView android:id="#+id/formattedtext"
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:text="#string/htmlstring"/>
if I do this, the content of formattedtext is just the content of somestring stripped of any html tags and thus unformatted.
I know that it is possible to set the formatted text programmatically with
.setText(Html.fromHtml(somestring));
because I use this in other parts of my program where it is working as expected.
To call this function I need an Activity, but at the moment my layout is just a simple more or less static view in plain XML and I'd prefer to leave it that way, to save me from the overhead of creating an Activity just to set some text.
Am I overlooking something obvious? Is it not possible at all? Any help or workarounds welcome!
Edit: Just tried some things and it seems that HTML formatting in xml has some restraints:
tags must be written lowercase
some tags which are mentioned here do not work, e.g. <br/> (it's possible to use \n instead)

Just in case anybody finds this, there's a nicer alternative that's not documented (I tripped over it after searching for hours, and finally found it in the bug list for the Android SDK itself). You CAN include raw HTML in strings.xml, as long as you wrap it in
<![CDATA[ ...raw html... ]]>
Edge Cases:
Characters like apostrophe ('), double-quote ("), and ampersand (&) only need to be escaped if you want them to appear in the rendered text AS themselves, but they COULD be plausibly interpreted as HTML.
' and " can be represented as\' and \", or &apos; and ".
< and > always need to be escaped as < and > if you literally want them to render as '<' and '>' in the text.
Ampersand (&) is a little more complicated.
Ampersand followed by whitespace renders as ampersand.
Ampersand followed by one or more characters that don't form a valid HTML entity code render as Ampersand followed by those characters. So... &qqq; renders as &qqq;, but <1 renders as <1.
Example:
<string name="nice_html">
<![CDATA[
<p>This is a html-formatted \"string\" with <b>bold</b> and <i>italic</i> text</p>
<p>This is another paragraph from the same \'string\'.</p>
<p>To be clear, 0 < 1, & 10 > 1<p>
]]>
</string>
Then, in your code:
TextView foo = (TextView)findViewById(R.id.foo);
foo.setText(Html.fromHtml(getString(R.string.nice_html), FROM_HTML_MODE_LEGACY));
IMHO, this is several orders of magnitude nicer to work with :-)
August 2021 update: My original answer used Html.fromHtml(String), which was deprecated in API 24. The alternative fromHtml(String,int) form is suggested as its replacement.
FROM_HTML_MODE_LEGACY is likely to work... but one of the other flags might be a better choice for what you want to do.
On a final note, if you'd prefer to render Android Spanned text suitable for use in a TextView using Markdown syntax instead of HTML, there are now multiple thirdparty libraries to make it easy including https://noties.io/Markwon.

As the top answer here is suggesting something wrong (or at least too complicated), I feel this should be updated, although the question is quite old:
When using String resources in Android, you just have to call getString(...) from Java code or use android:text="#string/..." in your layout XML.
Even if you want to use HTML markup in your Strings, you don't have to change a lot:
The only characters that you need to escape in your String resources are:
double quotation mark: " becomes \"
single quotation mark: ' becomes \'
ampersand: & becomes & or &
That means you can add your HTML markup without escaping the tags:
<string name="my_string"><b>Hello World!</b> This is an example.</string>
However, to be sure, you should only use <b>, <i> and <u> as they are listed in the documentation.
If you want to use your HTML strings from XML, just keep on using android:text="#string/...", it will work fine.
The only difference is that, if you want to use your HTML strings from Java code, you have to use getText(...) instead of getString(...) now, as the former keeps the style and the latter will just strip it off.
It's as easy as that. No CDATA, no Html.fromHtml(...).
You will only need Html.fromHtml(...) if you did encode your special characters in HTML markup. Use it with getString(...) then. This can be necessary if you want to pass the String to String.format(...).
This is all described in the docs as well.
Edit:
There is no difference between getText(...) with unescaped HTML (as I've proposed) or CDATA sections and Html.fromHtml(...).
See the following graphic for a comparison:

Escape your HTML tags ...
<resources>
<string name="somestring">
<B>Title</B><BR/>
Content
</string>
</resources>

Android does not have a specification to indicate the type of resource string (e.g. text/plain or text/html). There is a workaround, however, that will allow the developer to specify this within the XML file.
Define a custom attribute to specify that the android:text attribute is html.
Use a subclassed TextView.
Once you define these, you can express yourself with HTML in xml files without ever having to call setText(Html.fromHtml(...)) again. I'm rather surprised that this approach is not part of the API.
This solution works to the degree that the Android studio simulator will display the text as rendered HTML.
res/values/strings.xml (the string resource as HTML)
<resources>
<string name="app_name">TextViewEx</string>
<string name="string_with_html"><![CDATA[
<em>Hello</em> <strong>World</strong>!
]]></string>
</resources>
layout.xml (only the relevant parts)
Declare the custom attribute namespace, and add the android_ex:isHtml attribute. Also use the subclass of TextView.
<RelativeLayout
...
xmlns:android_ex="http://schemas.android.com/apk/res-auto"
...>
<tv.twelvetone.samples.textviewex.TextViewEx
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="#string/string_with_html"
android_ex:isHtml="true"
/>
</RelativeLayout>
res/values/attrs.xml (define the custom attributes for the subclass)
<resources>
<declare-styleable name="TextViewEx">
<attr name="isHtml" format="boolean"/>
<attr name="android:text" />
</declare-styleable>
</resources>
TextViewEx.java (the subclass of TextView)
package tv.twelvetone.samples.textviewex;
import android.content.Context;
import android.content.res.TypedArray;
import android.support.annotation.Nullable;
import android.text.Html;
import android.util.AttributeSet;
import android.widget.TextView;
public TextViewEx(Context context, #Nullable AttributeSet attrs) {
super(context, attrs);
TypedArray a = context.obtainStyledAttributes(attrs, R.styleable.TextViewEx, 0, 0);
try {
boolean isHtml = a.getBoolean(R.styleable.TextViewEx_isHtml, false);
if (isHtml) {
String text = a.getString(R.styleable.TextViewEx_android_text);
if (text != null) {
setText(Html.fromHtml(text));
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
a.recycle();
}
}
}

Latest update:
Html.fromHtml(string);//deprecated after Android N versions..
Following code give support to android N and above versions...
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
textView.setText(Html.fromHtml(yourHtmlString,Html.FROM_HTML_MODE_LEGACY));
}
else
{
textView.setText(Html.fromHtml(yourHtmlString));
}

String termsOfCondition="<font color=#cc0029>Terms of Use </font>";
String commma="<font color=#000000>, </font>";
String privacyPolicy="<font color=#cc0029>Privacy Policy </font>";
Spanned text=Html.fromHtml("I am of legal age and I have read, understood, agreed and accepted the "+termsOfCondition+commma+privacyPolicy);
secondCheckBox.setText(text);

I have another case when I have no chance to put CDATA into the xml as I receive the string HTML from a server.
Here is what I get from a server:
<p>The quick brown <br />
fox jumps <br />
over the lazy dog<br />
</p>
It seems to be more complicated but the solution is much simpler.
private TextView textView;
protected void onCreate(Bundle savedInstanceState) {
.....
textView = (TextView) findViewById(R.id.text); //need to define in your layout
String htmlFromServer = getHTMLContentFromAServer();
textView.setText(Html.fromHtml(htmlFromServer).toString());
}
Hope it helps!
Linh

If you want to show html scrip in android app Like TextView
Please follow this code
Kotlin
var stringvalue = "Your Sting"
yourTextVew.text = if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
Html.fromHtml(stringvalue, Html.FROM_HTML_MODE_COMPACT)
} else {
Html.fromHtml(stringvalue)
}
Java
String stringvalue = "Your String";
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
yourTextVew.setText(Html.fromHtml(stringvalue, Html.FROM_HTML_MODE_COMPACT))
} else {
yourTextVew.setText( Html.fromHtml(stringvalue))
}

Related

Extracting the first formatted line from some RTF/HTML text

OK, I painted myself into a corner on this one and haven't decided the way out yet.
My web application hosts a series of documents written by users, and edited with the CLEditor editor via PrimeFaces. The documents can be any size and have any formatting the user chooses.
What I want to do is treat the first line of the document as a title, so that when I create a listing of those documents I show only the title, then the user can click on that table row to see the whole document. I show the title with
<h:outputText value="#{backBean.doc}" escape="false" />
What I did is pull the substring of the document out up until but not including the first pattern of the br tag. That works unless the user applies formatting that spans past that. The resulting string has unclosed HTML tags usually div or span) and when they are output without escaping they interfere or even blank out the rest of the page.
So I am looking for an easy solution to fix the HTML fragment. I would rather not import a huge library such as JTidy because it pulls in all sorts of dependencies I don't have right now like a DOM parser, etc. Can anyone suggest a cheaper yet robust solution? Is there any way to clean this up on the client side?
I'd suggest Jsoup.
To parse the HTML and get its <body> content, it's a matter of this oneliner:
String htmlBody = Jsoup.parse(userInput).body().html();
By the way, since you seem to intend to redisplay user-controlled HTML unescaped, I strongly recommend to whitelist it to prevent XSS. E.g.
String safeHtmlBody = Jsoup.clean(htmlBody, Whitelist.basic());
This way you can safely redisplay it without worrying about a XSS attack hole:
<h:outputText value="#{bean.safeHtmlBody}" escape="false" />
See also:
What are the pros and cons of the leading Java HTML parsers?
How to implement a possibility for user to post some html-formatted data in a safe way?
CSRF, XSS and SQL Injection attack prevention in JSF
You should be escaping the partial contents of the document somehow, otherwise users can upload documents containing HTML/JavaScript code that will compromise your site. As you can see, even simple formatting can break it. One solution could be to remove all tags (via regex, string replace, etc) and then escape the title.
I figure out the JTidy way of doing it. This seems very heavy-handed to me but I'm going with it until something better is suggested. Also if someone else is in this situation it might be useful:
public class TitleRTF {
private static final Pattern pTidy = Pattern.compile("<body>(.*)</body>");
public TitleRTF() {}
public static String getTitle(String rtfSource) {
org.w3c.tidy.Tidy tidy = new org.w3c.tidy.Tidy();
tidy.setQuiet(true);
ByteArrayInputStream bais = new ByteArrayInputStream(rtfSource.getBytes());
org.w3c.dom.Document doc = tidy.parseDOM(new BufferedInputStream(bais), null);
try {
Transformer tr = TransformerFactory.newInstance().newTransformer();
StreamResult result = new StreamResult(new StringWriter());
NodeList list = doc.getElementsByTagName("body");
if (list.getLength() > 0) {
DOMSource source = new DOMSource(list.item(0));
tr.transform(source, result);
String text = result.getWriter().toString();
Matcher m = pTidy.matcher(text);
if (m.find()) return m.group(1);
}
} catch (TransformerException ex) { }
return "(not parsable)";
}
}
One thing that needs to be added to this is a way of keeping JTidy from logging what it sees as HTML errors. The setQuiet(true) doesn't seem to do it.

Is there an easy way to strip HTML from a QString in Qt?

I have a QString with some HTML in it... is there an easy way to strip the HTML from it? I basically want just the actual text content.
<i>Test:</i><img src="blah.png" /><br> A test case
Would become:
Test: A test case
I'm curious to know if Qt has a string function or utility for this.
QString s = "<i>Test:</i><img src=\"blah.png\" /><br> A test case";
s.remove(QRegExp("<[^>]*>"));
// s == "Test: A test case"
If you don't care about performance that much then QTextDocument does a pretty good job of converting HTML to plain text.
QTextDocument doc;
doc.setHtml( htmlString );
return doc.toPlainText();
I know this question is old, but I was looking for a quick and dirty way to handle incorrect HTML. The XML parser wasn't giving good results.
You may try to iterate through the string using QXmlStreamReader class and extract all text (if you HTML string is guarantied to be well formed XML).
Something like this:
QXmlStreamReader xml(htmlString);
QString textString;
while (!xml.atEnd()) {
if ( xml.readNext() == QXmlStreamReader::Characters ) {
textString += xml.text();
}
}
but I'm unsure that its 100% valid ussage of QXmlStreamReader API since I've used it quite longe time ago and may forget something.
the situation that some html is not quite validate xml make it worse to work it out correctly.
If it's valid xml (or not too bad formated), I think QXmlStreamReader + QXmlStreamEntityResolver might not be bad idea.
Sample code in: https://github.com/ycheng/misccode/blob/master/qt_html_parse/utils.cpp
(this can be a comment, but I still don't have permission to do so)
this answer is for who read this post later and using Qt5 or later. simply escape the html characters using inbuilt functions as below.
QString str="<h1>some hedding </h1>"; // a string containing html tags.
QString esc=str.toHtmlEscaped(); //esc contains the html escaped srring.

How to add non-escaped ampersands to HTML with Nokogiri::XML::Builder

I would like to add things like bullet points "•" to HTML using the XML Builder in Nokogiri, but everything is being escaped. How do I prevent it from being escaped?
I would like the result to be:
<span>•</span>
rather than:
<span>&#8226;</span>
I'm just doing this:
xml.span {
xml.text "•\ "
}
What am I missing?
If you define
class Nokogiri::XML::Builder
def entity(code)
doc = Nokogiri::XML("<?xml version='1.0'?><root>&##{code};</root>")
insert(doc.root.children.first)
end
end
then this
builder = Nokogiri::XML::Builder.new do |xml|
xml.span {
xml.text "I can has "
xml.entity 8665
xml.text " entity?"
}
end
puts builder.to_xml
yields
<?xml version="1.0"?>
<span>I can has • entity?</span>
PS this a workaround only, for a clean solution please refer to the libxml2 documentation (Nokogiri is built on libxml2) for more help. However, even these folks admit that handling entities can be quite ..err, cumbersome sometimes.
When you're setting the text of an element, you really are setting text, not HTML source. < and & don't have any special meaning in plain text.
So just type a bullet: '•'. Of course your source code and your XML file will have to be using the same encoding for that to come out right. If your XML file is UTF-8 but your source code isn't, you'd probably have to say '\xe2\x80\xa2' which is the UTF-8 byte sequence for the bullet character as a string literal.
(In general non-ASCII characters in Ruby 1.8 are tricky. The byte-based interfaces don't mesh too well with XML's world of all-text-is-Unicode.)

Regex for unclosed HTML tags

Does someone have a regex to match unclosed HTML tags? For example, the regex would match the <b> and second <i>, but not the first <i> or the first's closing </i> tag:
<i><b>test<i>ing</i>
Is this too complex for regex? Might it require some recursive, programmatic processing?
I'm sure some regex guru can cobble something together that approximates a solution, but it's a bad idea: HTML isn't regular. Consider either a HTML parser that's capable of identifying such problems, or parsing it yourself.
Yes it requires recursive processing, and potentially quite deep (or a fancy loop of course), it is not going to be done with a regex. You could make a regex that handled a few levels deep, but not one that will work on just any html file. This is because the parser would have to remember what tags are open at any given point in the stream, and regex arent good at that.
Use a SAX parser with some counters, or use a stack with pop off/push on to keep your state. Think about how to code this game to see what I mean about html tag depth. http://en.wikipedia.org/wiki/Tower_of_Hanoi
As #Pesto said, HTML isn't regular, you would have to build html grammar rules, and apply them recursively.
If you are looking to fix HTML programatically, I have used a component called html tidy with considerable success. There are builds for it for most languages (COM+, Dotnet, PHP etc...).
If you just need to fix it manually, I'd recommend a good IDE. Visual Studio 2008 does a good job, so does the latest Dreamweaver.
No, that's to complex for a regular expression. Your problem is equivalent to test an arithmetic expression of proper usage of brackets which needs at least an pushdown automaton to success.
In your case you should split the HTML code in opening tags, closing tags and text nodes (e.g with an regular expression). Store the result in a list. Then you can iterate through node list and push every opening tag onto the stack. If you encounter a closing tag in your node list you must check that the topmost stack entry is a opening tag of the same type. Otherwise you found the html syntax error you looked for.
I've got a case where I am dealing with single, self-contained lines. The following regular expression worked for me: <[^/]+$ which matches a "<" and then anything that's not a "/".
You can use RegEx to identify all the html begin/end elements, and then enumerate with a Stack, Push new elements, and Pop the closing tags. Try this in C# -
public static bool ValidateHtmlTags(string html)
{
string expr = "(<([a-zA-Z]+)\\b[^>]*>)|(</([a-zA-Z]+) *>)";
Regex regex = new Regex(expr, RegexOptions.IgnoreCase);
var stack = new Stack<Tuple<string, string>>();
var result = new StringBuilder();
bool valid = true;
foreach (Match match in regex.Matches(html))
{
string element = match.Value;
string beginTag = match.Groups[2].Value;
string endTag = match.Groups[4].Value;
if (beginTag == "")
{
string previousTag = stack.Peek().Item1;
if (previousTag == endTag)
stack.Pop();
else
{
valid = false;
break;
}
}
else if (!element.EndsWith("/>"))
{
// Write more informative message here if desired
string message = string.Format("Char({0})", match.Index);
stack.Push(new Tuple<string, string>(beginTag, message));
}
}
if (stack.Count > 0)
valid = false;
// Alternative return stack.Peek().Item2 for more informative message
return valid;
}
I suggest using Nokogiri:
Nokogiri::HTML::DocumentFragment.parse(html).to_html

HTML rendered incorrectly in .NET

I am trying to take the string "<BR>" in VB.NET and convert it to HTML through XSLT. When the HTML comes out, though, it looks like this:
<BR>
I can only assume it goes ahead and tries to render it. Is there any way I can convert those </> back into the brackets so I get the line break I'm trying for?
Check the XSLT has:
<xsl:output method="html"/>
edit: explanation from comments
By default XSLT outputs as XML(1) which means it will escape any significant characters. You can override this in specific instances with the attribute disable-output-escaping="yes" (intro here) but much more powerful is to change the output to the explicit value of HTML which confides same benefit globally, as the following:
For script and style elements, replace any escaped characters (such
as & and >) with their actual values
(& and >, respectively).
For attributes, replace any occurrences of > with >.
Write empty elements such as <br>, <img>, and <input> without
closing tags or slashes.
Write attributes that convey information by their presence as
opposed to their value, such as
checked and selected, in minimized
form.
from a solid IBM article covering the subject, more recent coverage from stylusstudio here
If HTML output is what you desire HTML output is what you should specify.
(1) There is actually corner case where output defaults to HTML, but I don't think it's universal and it's kind of obtuse to depend on it.
Try wraping it with <xsl:text disable-output-escaping="yes"><br></xsl:text>
Don't know about XSLT but..
One workaround might be using HttpUtility.HtmlDecode from System.Web namespace.
using System;
using System.Web;
class Program
{
static void Main()
{
Console.WriteLine(HttpUtility.HtmlDecode("<br>"));
Console.ReadKey();
}
}
...
Got it! On top of the selected answer, I also did something similar to this on my string:
htmlString = htmlString.Replace("<","<")
htmlString = htmlString.Replace(">",">")
I think, though, that in the end, I may just end up using <pre> tags to preserve everything.
The string "<br>" is already HTML so you can just Response.Write("<br>").
But you meantion XSLT so I imagine there some transform going on. In that case surely the transform should be inserting it at the correct place as a node. A better question will likely get a better answer