I've got a string of html code with some additional tags inside. I need to get rid of the additional tags, but can't use fn:escapeXML because that would make the string no longer usable as html.
Example:
The value of "newLink" is set as:
test</span>">This is a <span class="help">test</span>
How can I use fn:replace (or some other jstl coding) to get rid of the inner tags?
This is what I've managed so far. Unfortunately, the last doesn't seem to match the empty tags.
<c:set var="displayValue">${fn:replace(pnxItem, 'span class=\"searchword\"', '')}</c:set>
<c:set var="displayValue">${fn:replace(displayValue, '/span', '')}</c:set>
<c:set var="displayValue">${fn:replace(displayValue, '><', '')}</c:set>
<c:set var="displayValue">${fn:replace(displayValue, '<>', '')}</c:set>
Wouldn't it be nice if you could use a regular expression? Jstl's replace function does not take a regex as an argument, as you probably already know, judging from the approach you have taken.
Luckily it is quite easy to write your own functions. I always carry along a class named StringFunctions, consisting of static functions, to be used as a custom functions library that holds, for instance, a removeTags function.
package com.whatever.viewhelpers;
public class StringFunctions {
public static String removeTags(String s) {
return s.replaceAll("\\<.*?>","");
}
// more functions ...
}
Now include this class in a tag library descriptor.
<taglib xmlns="http://java.sun.com/xml/ns/j2ee" version="2.0">
<tlib-version>1.0</tlib-version>
<short-name>myfn</short-name>
<uri>http://www.whatever.com/taglib/trlbt</uri>
<function>
<name>removeTags</name>
<function-class>
com.whatever.viewhelpers.StringFunctions
</function-class>
<function-signature>
String removeTags(java.lang.String)
</function-signature>
</function>
<!-- more functions -->
</taglib>
And use in jsp:
<%# taglib prefix="myfn" uri="/WEB-INF/taglib/tlb.tld" %>
....
${myfn:removeTags( ... )}
I chose myfn as prefix but you are free to choose whatever suits you.
Not sure if the proposed regex fully satisfies your needs, but it demonstrates the principle of custom functions. You can do anything Java allows you to in there, you could even use jsoup to get rid of the tags.
Related
How can I display my list in a TestArea line after line with no additional spaces. i.e:
this
that
the
other
Here is my attempt:
<div class="text">
<label for="output_string">Output:</label> `
<textarea rows="10" cols="20">
<c:forEach var="x" items="${messagelist}">${x}</c:forEach>
</textarea>
</div>
Here's a guess (which I'll try out in just a sec in one of my own pages):
<c:forEach var='x' items='${messagelist}'><c:out value='${x}\r\n'/></c:forEach>
edit — no that doesn't seem to work at all. However, what did work was for me to add a message catalog entry like this:
linebreak={0}\r\n
Then you can use <fmt:message key="linebreak"><fmt:param value="${x}"/></fmt:message> to produce the string terminated by line breaks.
Note that JSP will put spaces before the first entry according to the indentation in your .jsp source file before the <c:forEach>, so you'll have to line everything up at the left edge if you don't want that.
If I had to do this a lot, I'd write an EL add-on function of my own to echo back a string followed by CRLF.
edit — If you want to write an EL add-on, you need two things:
The function itself, which should be a public static method of some class. I keep a class around called "ELFunctions" for most of mine. You can arrange them any way you want.
A ".tld" file, if you don't already have one. It should end up in your webapp somewhere under "WEB-INF". Mine goes in a subdirectory called "tld", but you can put it anywhere.
So you would write a little function like this, in some class:
public static String linebreak(final String msg) {
return msg + "\r\n";
}
Then your ".tld" file would look like this (assuming it's the only thing you've got; if you have an existing ".tld" file just add the clause):
<taglib xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee/web-jsptaglibrary_2_0.xsd" version="2.0">
<description>Your Favorite Description</description>
<display-name>Make Something Up</display-name>
<tlib-version>4.0</tlib-version>
<short-name>whatever</short-name>
<uri>http://yourdomain.com/tld/whatever</uri>
<function>
<description>
Return a string augmented with trailing line break (CR - LF pair)
</description>
<name>linebreak</name>
<function-class>your.package.name.YourClass</function-class>
<function-signature>
java.lang.String linebreak(java.lang.String)
</function-signature>
</function>
(Boy, XML is so annoying.) Now somewhere you probably already have a little file that pulls in taglibs for your pages (for <c:...> tags at least). In there, or at the top of any page, add a line like this:
<%# taglib prefix="whatever" uri='http://yourdomain.com/tld/tango' %>
I think that the JSP runtime searches for ".tld" files by looking through the WEB-INF subtree, and in .jar files in WEB-INF/lib, matching by that "uri" string. Anyway, once you've done that, in your JSP file you can say:
<c:forEach var='x' items='${messagelist}'>${whatever:linebreak(x)}</c:forEach>
and it'll invoke your function.
<c:set var="xv"></c:set>
<c:forEach items="${messagelist}" var="x">
<c:if test="${not empty x}">
<c:choose>
<c:when test="${idx.first}"><c:set var="xv" value="${x}"></c:set></c:when>
<c:otherwise><c:set var="xv" value="${xv},${x}"></c:set></c:otherwise>
</c:choose>
</c:if>
</c:forEach>
<textarea cols="45" rows="5">${xv}</textarea>
I have some fixed strings inside my strings.xml, something like:
<resources>
<string name="somestring">
<B>Title</B><BR/>
Content
</string>
</resources>
and in my layout I've got a TextView which I'd like to fill with the html-formatted string.
<TextView android:id="#+id/formattedtext"
android:layout_width="fill_parent"
android:layout_height="wrap_content"
android:text="#string/htmlstring"/>
if I do this, the content of formattedtext is just the content of somestring stripped of any html tags and thus unformatted.
I know that it is possible to set the formatted text programmatically with
.setText(Html.fromHtml(somestring));
because I use this in other parts of my program where it is working as expected.
To call this function I need an Activity, but at the moment my layout is just a simple more or less static view in plain XML and I'd prefer to leave it that way, to save me from the overhead of creating an Activity just to set some text.
Am I overlooking something obvious? Is it not possible at all? Any help or workarounds welcome!
Edit: Just tried some things and it seems that HTML formatting in xml has some restraints:
tags must be written lowercase
some tags which are mentioned here do not work, e.g. <br/> (it's possible to use \n instead)
Just in case anybody finds this, there's a nicer alternative that's not documented (I tripped over it after searching for hours, and finally found it in the bug list for the Android SDK itself). You CAN include raw HTML in strings.xml, as long as you wrap it in
<![CDATA[ ...raw html... ]]>
Edge Cases:
Characters like apostrophe ('), double-quote ("), and ampersand (&) only need to be escaped if you want them to appear in the rendered text AS themselves, but they COULD be plausibly interpreted as HTML.
' and " can be represented as\' and \", or ' and ".
< and > always need to be escaped as < and > if you literally want them to render as '<' and '>' in the text.
Ampersand (&) is a little more complicated.
Ampersand followed by whitespace renders as ampersand.
Ampersand followed by one or more characters that don't form a valid HTML entity code render as Ampersand followed by those characters. So... &qqq; renders as &qqq;, but <1 renders as <1.
Example:
<string name="nice_html">
<![CDATA[
<p>This is a html-formatted \"string\" with <b>bold</b> and <i>italic</i> text</p>
<p>This is another paragraph from the same \'string\'.</p>
<p>To be clear, 0 < 1, & 10 > 1<p>
]]>
</string>
Then, in your code:
TextView foo = (TextView)findViewById(R.id.foo);
foo.setText(Html.fromHtml(getString(R.string.nice_html), FROM_HTML_MODE_LEGACY));
IMHO, this is several orders of magnitude nicer to work with :-)
August 2021 update: My original answer used Html.fromHtml(String), which was deprecated in API 24. The alternative fromHtml(String,int) form is suggested as its replacement.
FROM_HTML_MODE_LEGACY is likely to work... but one of the other flags might be a better choice for what you want to do.
On a final note, if you'd prefer to render Android Spanned text suitable for use in a TextView using Markdown syntax instead of HTML, there are now multiple thirdparty libraries to make it easy including https://noties.io/Markwon.
As the top answer here is suggesting something wrong (or at least too complicated), I feel this should be updated, although the question is quite old:
When using String resources in Android, you just have to call getString(...) from Java code or use android:text="#string/..." in your layout XML.
Even if you want to use HTML markup in your Strings, you don't have to change a lot:
The only characters that you need to escape in your String resources are:
double quotation mark: " becomes \"
single quotation mark: ' becomes \'
ampersand: & becomes & or &
That means you can add your HTML markup without escaping the tags:
<string name="my_string"><b>Hello World!</b> This is an example.</string>
However, to be sure, you should only use <b>, <i> and <u> as they are listed in the documentation.
If you want to use your HTML strings from XML, just keep on using android:text="#string/...", it will work fine.
The only difference is that, if you want to use your HTML strings from Java code, you have to use getText(...) instead of getString(...) now, as the former keeps the style and the latter will just strip it off.
It's as easy as that. No CDATA, no Html.fromHtml(...).
You will only need Html.fromHtml(...) if you did encode your special characters in HTML markup. Use it with getString(...) then. This can be necessary if you want to pass the String to String.format(...).
This is all described in the docs as well.
Edit:
There is no difference between getText(...) with unescaped HTML (as I've proposed) or CDATA sections and Html.fromHtml(...).
See the following graphic for a comparison:
Escape your HTML tags ...
<resources>
<string name="somestring">
<B>Title</B><BR/>
Content
</string>
</resources>
Android does not have a specification to indicate the type of resource string (e.g. text/plain or text/html). There is a workaround, however, that will allow the developer to specify this within the XML file.
Define a custom attribute to specify that the android:text attribute is html.
Use a subclassed TextView.
Once you define these, you can express yourself with HTML in xml files without ever having to call setText(Html.fromHtml(...)) again. I'm rather surprised that this approach is not part of the API.
This solution works to the degree that the Android studio simulator will display the text as rendered HTML.
res/values/strings.xml (the string resource as HTML)
<resources>
<string name="app_name">TextViewEx</string>
<string name="string_with_html"><![CDATA[
<em>Hello</em> <strong>World</strong>!
]]></string>
</resources>
layout.xml (only the relevant parts)
Declare the custom attribute namespace, and add the android_ex:isHtml attribute. Also use the subclass of TextView.
<RelativeLayout
...
xmlns:android_ex="http://schemas.android.com/apk/res-auto"
...>
<tv.twelvetone.samples.textviewex.TextViewEx
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="#string/string_with_html"
android_ex:isHtml="true"
/>
</RelativeLayout>
res/values/attrs.xml (define the custom attributes for the subclass)
<resources>
<declare-styleable name="TextViewEx">
<attr name="isHtml" format="boolean"/>
<attr name="android:text" />
</declare-styleable>
</resources>
TextViewEx.java (the subclass of TextView)
package tv.twelvetone.samples.textviewex;
import android.content.Context;
import android.content.res.TypedArray;
import android.support.annotation.Nullable;
import android.text.Html;
import android.util.AttributeSet;
import android.widget.TextView;
public TextViewEx(Context context, #Nullable AttributeSet attrs) {
super(context, attrs);
TypedArray a = context.obtainStyledAttributes(attrs, R.styleable.TextViewEx, 0, 0);
try {
boolean isHtml = a.getBoolean(R.styleable.TextViewEx_isHtml, false);
if (isHtml) {
String text = a.getString(R.styleable.TextViewEx_android_text);
if (text != null) {
setText(Html.fromHtml(text));
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
a.recycle();
}
}
}
Latest update:
Html.fromHtml(string);//deprecated after Android N versions..
Following code give support to android N and above versions...
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
textView.setText(Html.fromHtml(yourHtmlString,Html.FROM_HTML_MODE_LEGACY));
}
else
{
textView.setText(Html.fromHtml(yourHtmlString));
}
String termsOfCondition="<font color=#cc0029>Terms of Use </font>";
String commma="<font color=#000000>, </font>";
String privacyPolicy="<font color=#cc0029>Privacy Policy </font>";
Spanned text=Html.fromHtml("I am of legal age and I have read, understood, agreed and accepted the "+termsOfCondition+commma+privacyPolicy);
secondCheckBox.setText(text);
I have another case when I have no chance to put CDATA into the xml as I receive the string HTML from a server.
Here is what I get from a server:
<p>The quick brown <br />
fox jumps <br />
over the lazy dog<br />
</p>
It seems to be more complicated but the solution is much simpler.
private TextView textView;
protected void onCreate(Bundle savedInstanceState) {
.....
textView = (TextView) findViewById(R.id.text); //need to define in your layout
String htmlFromServer = getHTMLContentFromAServer();
textView.setText(Html.fromHtml(htmlFromServer).toString());
}
Hope it helps!
Linh
If you want to show html scrip in android app Like TextView
Please follow this code
Kotlin
var stringvalue = "Your Sting"
yourTextVew.text = if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
Html.fromHtml(stringvalue, Html.FROM_HTML_MODE_COMPACT)
} else {
Html.fromHtml(stringvalue)
}
Java
String stringvalue = "Your String";
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
yourTextVew.setText(Html.fromHtml(stringvalue, Html.FROM_HTML_MODE_COMPACT))
} else {
yourTextVew.setText( Html.fromHtml(stringvalue))
}
Does someone have a regex to match unclosed HTML tags? For example, the regex would match the <b> and second <i>, but not the first <i> or the first's closing </i> tag:
<i><b>test<i>ing</i>
Is this too complex for regex? Might it require some recursive, programmatic processing?
I'm sure some regex guru can cobble something together that approximates a solution, but it's a bad idea: HTML isn't regular. Consider either a HTML parser that's capable of identifying such problems, or parsing it yourself.
Yes it requires recursive processing, and potentially quite deep (or a fancy loop of course), it is not going to be done with a regex. You could make a regex that handled a few levels deep, but not one that will work on just any html file. This is because the parser would have to remember what tags are open at any given point in the stream, and regex arent good at that.
Use a SAX parser with some counters, or use a stack with pop off/push on to keep your state. Think about how to code this game to see what I mean about html tag depth. http://en.wikipedia.org/wiki/Tower_of_Hanoi
As #Pesto said, HTML isn't regular, you would have to build html grammar rules, and apply them recursively.
If you are looking to fix HTML programatically, I have used a component called html tidy with considerable success. There are builds for it for most languages (COM+, Dotnet, PHP etc...).
If you just need to fix it manually, I'd recommend a good IDE. Visual Studio 2008 does a good job, so does the latest Dreamweaver.
No, that's to complex for a regular expression. Your problem is equivalent to test an arithmetic expression of proper usage of brackets which needs at least an pushdown automaton to success.
In your case you should split the HTML code in opening tags, closing tags and text nodes (e.g with an regular expression). Store the result in a list. Then you can iterate through node list and push every opening tag onto the stack. If you encounter a closing tag in your node list you must check that the topmost stack entry is a opening tag of the same type. Otherwise you found the html syntax error you looked for.
I've got a case where I am dealing with single, self-contained lines. The following regular expression worked for me: <[^/]+$ which matches a "<" and then anything that's not a "/".
You can use RegEx to identify all the html begin/end elements, and then enumerate with a Stack, Push new elements, and Pop the closing tags. Try this in C# -
public static bool ValidateHtmlTags(string html)
{
string expr = "(<([a-zA-Z]+)\\b[^>]*>)|(</([a-zA-Z]+) *>)";
Regex regex = new Regex(expr, RegexOptions.IgnoreCase);
var stack = new Stack<Tuple<string, string>>();
var result = new StringBuilder();
bool valid = true;
foreach (Match match in regex.Matches(html))
{
string element = match.Value;
string beginTag = match.Groups[2].Value;
string endTag = match.Groups[4].Value;
if (beginTag == "")
{
string previousTag = stack.Peek().Item1;
if (previousTag == endTag)
stack.Pop();
else
{
valid = false;
break;
}
}
else if (!element.EndsWith("/>"))
{
// Write more informative message here if desired
string message = string.Format("Char({0})", match.Index);
stack.Push(new Tuple<string, string>(beginTag, message));
}
}
if (stack.Count > 0)
valid = false;
// Alternative return stack.Peek().Item2 for more informative message
return valid;
}
I suggest using Nokogiri:
Nokogiri::HTML::DocumentFragment.parse(html).to_html
I am trying to take the string "<BR>" in VB.NET and convert it to HTML through XSLT. When the HTML comes out, though, it looks like this:
<BR>
I can only assume it goes ahead and tries to render it. Is there any way I can convert those </> back into the brackets so I get the line break I'm trying for?
Check the XSLT has:
<xsl:output method="html"/>
edit: explanation from comments
By default XSLT outputs as XML(1) which means it will escape any significant characters. You can override this in specific instances with the attribute disable-output-escaping="yes" (intro here) but much more powerful is to change the output to the explicit value of HTML which confides same benefit globally, as the following:
For script and style elements, replace any escaped characters (such
as & and >) with their actual values
(& and >, respectively).
For attributes, replace any occurrences of > with >.
Write empty elements such as <br>, <img>, and <input> without
closing tags or slashes.
Write attributes that convey information by their presence as
opposed to their value, such as
checked and selected, in minimized
form.
from a solid IBM article covering the subject, more recent coverage from stylusstudio here
If HTML output is what you desire HTML output is what you should specify.
(1) There is actually corner case where output defaults to HTML, but I don't think it's universal and it's kind of obtuse to depend on it.
Try wraping it with <xsl:text disable-output-escaping="yes"><br></xsl:text>
Don't know about XSLT but..
One workaround might be using HttpUtility.HtmlDecode from System.Web namespace.
using System;
using System.Web;
class Program
{
static void Main()
{
Console.WriteLine(HttpUtility.HtmlDecode("<br>"));
Console.ReadKey();
}
}
...
Got it! On top of the selected answer, I also did something similar to this on my string:
htmlString = htmlString.Replace("<","<")
htmlString = htmlString.Replace(">",">")
I think, though, that in the end, I may just end up using <pre> tags to preserve everything.
The string "<br>" is already HTML so you can just Response.Write("<br>").
But you meantion XSLT so I imagine there some transform going on. In that case surely the transform should be inserting it at the correct place as a node. A better question will likely get a better answer
C#: What is a good Regex to parse hyperlinks and their description?
Please consider case insensitivity, white-space and use of single quotes (instead of double quotes) around the HREF tag.
Please also consider obtaining hyperlinks which have other tags within the <a> tags such as <b> and <i>.
As long as there are no nested tags (and no line breaks), the following variant works well:
<a\s+href=(?:"([^"]+)"|'([^']+)').*?>(.*?)</a>
As soon as nested tags come into play, regular expressions are unfit for parsing. However, you can still use them by applying more advanced features of modern interpreters (depending on your regex machine). E.g. .NET regular expressions use a stack; I found this:
(?:<a.*?href=[""'](?<url>.*?)[""'].*?>)(?<name>(?><a[^<]*>(?<DEPTH>)|</a>(?<-DEPTH>)|.)+)(?(DEPTH)(?!))(?:</a>)
Source: http://weblogs.asp.net/scottcate/archive/2004/12/13/281955.aspx
See this example from StackOverflow: Regular expression for parsing links from a webpage?
Using The HTML Agility Pack you can parse the html, and extract details using the semantics of the HTML, instead of a broken regex.
I found this but apparently these guys had some problems with it.
Edit: (It works!)
I have now done my own testing and found that it works, I don't know C# so I can't give you a C# answer but I do know PHP and here's the matches array I got back from running it on this:
Text
array(3) { [0]=> string(52) "Text" [1]=> string(15) "pages/index.php" [2]=> string(4) "Text" }
I have a regex that handles most cases, though I believe it does match HTML within a multiline comment.
It's written using the .NET syntax, but should be easily translatable.
Just going to throw this snippet out there now that I have it working..this is a less greedy version of one suggested earlier. The original wouldnt work if the input had multiple hyperlinks. This code below will allow you to loop through all the hyperlinks:
static Regex rHref = new Regex(#"<a.*?href=[""'](?<url>[^""^']+[.]*?)[""'].*?>(?<keywords>[^<]+[.]*?)</a>", RegexOptions.IgnoreCase | RegexOptions.Compiled);
public void ParseHyperlinks(string html)
{
MatchCollection mcHref = rHref.Matches(html);
foreach (Match m in mcHref)
AddKeywordLink(m.Groups["keywords"].Value, m.Groups["url"].Value);
}
Here is a regular expression that will match the balanced tags.
(?:""'[""'].*?>)(?(?>(?)|(?<-DEPTH>)|.)+)(?(DEPTH)(?!))(?:)