How to remove some html tags? - html

I'm trying to find a regex for VBScript to remove some html tags and their content from a string.
The string is,
<H2>Title</H2><SPAN class=tiny>Some
text here</SPAN><LI>Some list
here</LI><SCRITP>Some script
here</SCRITP><P>Some text here</P>
Now, I'd like to EXCLUDE <SPAN class=tiny>Some text here</SPAN> and <SCRITP>Some script here</SCRITP>
Maybe someone has a simple solution for this, thanks.

This should do the trick in VBScript:
Dim myRegExp, ResultString
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Global = True
myRegExp.Pattern = "<span class=tiny>[\s\S]*?</span>|<script>[\s\S]*?</script>"
ResultString = myRegExp.Replace(SubjectString, "")
SubjectString is the variable with your original HTML and ResultString receives the HTML with all occurrences of the two tags removed.
Note: I'm assuming scritp in your sample is a typo for script. If not, adjust my code sample accordingly.

You could do this a lot easier using css:
span.tiny {
display: none;
}
or using jQuery:
$("span.tiny").hide();

I think you want this
$(function(){
$('span.tiny').remove();
$('script').remove();
})

Related

Libgdx: How to show HTML text in a label?

I have a string like this:
"noun<br> an expression of greeting <br>- every morning they exchanged polite hellos<br> <font color=dodgerblue> ••</font> Syn: hullo, hi, howdy, how-do-you-do<be>"
want to show it in a label as a rich text. for example Instead of <br> tags, text must go to the next line.
in Android we can do that with:
Html.fromHtml(myHtmlString)
but I don't know how to do it in libgdx.
I try to use Jsoup but it removes all tags and does not go to the next line for <br> tag for example.
Jsoup.parse(myHtmlString).text()
Jsoup.parse returns a document containing many elements -of- strings. Not a single string so you are only seeing the first bit. You can assemble the complete string yourself by going through the elements or try
Document doc = Jsoup.parse(yourHtmlInput);
String htmlString = doc.toString();
String htmlText = "<p>This is an <strong>Example</strong></p>";
//this will convert your HTML text into normal text
String normalText = Jsoup.parse(htmlText).text();
in kotlin i use this code:
var definition = "my html string"
definition = definition.replace("<br>", "\n")
definition = definition.replace("<[^>]*>".toRegex(), "")

Html <pre> not formatting/rendering text correctly [duplicate]

I'm using Prototype's PeriodicalUpdater to update a div with the results of an ajax call. As I understand it, the div is updated by setting its innerHTML.
The div is wrapped in a <pre> tag. In Firefox, the <pre> formatting works as expected, but in IE, the text all ends up on one line.
Here's some sample code found here which illustrates the problem. In Firefox, abc is on different line than def; in IE it's on the same line.
<html>
<head>
<title>IE preformatted text sucks</title>
</head>
<body>
<pre id="test">
a b c
d e f
</pre>
<script type="text/javascript"><!--
var textContent = document.getElementById("test").innerText;
textContent = textContent.replace("a", "<span style=\"color:red;\">a</span>");
document.getElementById("test").style.whiteSpace = "pre";
document.getElementById("test").innerHTML = textContent;
--></script>
</body>
</html>
Anyone know of a way to get around this problem?
Setting innerHTML fires up an HTML parser, which ignores excess whitespace including hard returns. If you change your method to include the <pre> tag in the string, it works fine because the HTML parser retains the hard returns.
You can see this in action by doing a View Generated Source after you run your sample page:
<PRE id="test" style="WHITE-SPACE: pre"><SPAN style="COLOR: red">a</SPAN> b c d e f </PRE>
You can see here that the hard return is no longer part of the content of the <pre> tag.
Generally, you'll get more consistent results by using DOM methods to construct dynamic content, especially when you care about subtle things like normalization of whitespace. However, if you're set on using innerHTML for this, there is an IE workaround, which is to use the outerHTML attribute, and include the enclosing tags.
if(test.outerHTML)
test.outerHTML = '<pre id="test">'+textContent+'</pre>';
else
test.innerHTML = textContent;
This workaround and more discussion can be found here: Inserting a newline into a pre tag (IE, Javascript)
or you could
if (el.innerText) {
el.innerText = val;
} else {
el.innerHTML = val;
}
Don't know if this has been suggested before, but the solution I found for preserving white space, newlines, etc when doing an innerHTML into a 'pre' tag is to insert another 'pre' tag into the text:
<pre id="pretag"></pre>
TextToInsert = "lots of text with spaces and newlines";
document.getElementById("pretag").innerHTML = "<pre>" + TextToInsert + "</pre>";
Seems I.E. does parse the text before doing the innerHTML. The above causes the parser to leave the text inside the additional 'pre' tag unparsed. Makes sense since that's what the parser is supposed to do. also works with FF.
It could also be rewritten 'the Python way', i.e.:
el.innerText && el.innerText = val || el.innerHTML = val;

Extracing Non String Text HTML VBA

So, I am trying to get a date out of html using VBA in Excel, and I am having issues finding a way to extract the text that I want it appears as:
<SPAN id=ctl00_ContentPlaceHolder1_lblDateCreated2>5/22/2012 8:14:08 PM</SPAN>
I want extract the 5/22/2012 8:14:08, but as it is not a string and in between the carats, I don't know exactly how to do it. Any tips?
I figured out that I was using ".innerText" incorrectly, and I was able to get it working with the following snippet.
Doc.getElementById("ctl00_ContentPlaceHolder1_lblDateCreated2").innerText
You could do this in VBA with split:
theString = "<SPAN id=ctl00_ContentPlaceHolder1_lblDateCreated2>5/22/2012 8:14:08 PM</SPAN>"
Temp = Split(theString, "ContentPlaceHolder1_lblDateCreated2>")(1)
Final = Split(Temp, "</")(0)
The first Split will return an array of two parts:
Temp(0) = "<SPAN id=ctl00_"
Temp(1) = "5/22/2012 8:14:08 PM</SPAN>"
Next we Split Temp(1) to remove the closing SPAN tag and return just the date and time.
I think you're just looking for a Mid() formula. If that URL/Span part in A1, put this in A2 (or wherever):
=MID(A1,SEARCH(">",A1)+1,FIND("</",A1)-FIND(">",A1)-1)

JSFL: convert text from a textfield to a HTML-format string

I've got a deceptively simple question: how can I get the text from a text field AND include the formatting? Going through the usual docs I found out it is possible to get the text only. It is also possible to get the text formatting, but this only works if the entire text field uses only one kind of formatting. I need the precise formatting so that I convert it to a string with html-tags.
Personally I need this so I can pass it to a custom-made text field component that uses HTML for formatting. But it could also be used to simply export the contents of any text field to any other format. This could be of interest to others out there, too.
Looking for a solution elsewhere I found this:
http://labs.thesedays.com/blog/2010/03/18/jsfl-rich-text/
Which seems to do the reverse of what I need, convert HTML to Flash Text. My own attempts to reverse this have not been successful thus far. Maybe someone else sees an easy way to reverse this that I’m missing? There might also be other solutions. One might be to get the EXACT data of the text field, which should include formatting tags of some kind(XML, when looking into the contents of the stored FLA file). Then remove/convert those tags. But I have no idea how to do this, if at all possible. Another option is to cycle through every character using start- and endIndex, and storing each formatting kind in an array. Then I could apply the formatting to each character. But this will result in excess tags. Especially for hyperlinks! So can anybody help me with this?
A bit late to the party but the following function takes a JSFL static text element as input and returns a HTML string (using the Flash-friendly <font> tag) based on the styles found it its TextRuns array. It's doing a bit of basic regex to clear up some tags and double spaces etc. and convert /r and /n to <br/> tags. It's probably not perfect but hopefully you can see what's going on easy enough to change or fix it.
function tfToHTML(p_tf)
{
var textRuns = p_tf.textRuns;
var html = "";
for ( var i=0; i<textRuns.length; i++ )
{
var textRun = textRuns[i];
var chars = textRun.characters;
chars = chars.replace(/\n/g,"<br/>");
chars = chars.replace(/\r/g,"<br/>");
chars = chars.replace(/ /g," ");
chars = chars.replace(/. <br\/>/g,".<br/>");
var attrs = textRun.textAttrs;
var font = attrs.face;
var size = attrs.size;
var bold = attrs.bold;
var italic = attrs.italic;
var colour = attrs.fillColor;
if ( bold )
{
chars = "<b>"+chars+"</b>";
}
if ( italic )
{
chars = "<i>"+chars+"</i>";
}
chars = "<font size=\""+size+"\" face=\""+font+"\" color=\""+colour+"\">"+chars+"</font>";
html += chars;
}
return html;
}

replace keyword within html string

I am looking for a way to replace keywords within a html string with a variable. At the moment i am using the following example.
returnString = Replace(message, "[CustomerName]", customerName, CompareMethod.Text)
The above will work fine if the html block is spread fully across the keyword.
eg.
<b>[CustomerName]</b>
However if the formatting of the keyword is split throughout the word, the string is not found and thus not replaced.
e.g.
<b>[Customer</b>Name]
The formatting of the string is out of my control and isn't foolproof. With this in mind what is the best approach to find a keyword within a html string?
Try using Regex expression. Create your expressions here, I used this and it works well.
http://regex-test.com/validate/javascript/js_match
Use the text property instead of innerHTML if you're using javascript to access the content. That should remove all tags from the content, you give back a clean text representation of the customer's name.
For example, if the content looks like this:
<div id="name">
<b>[Customer</b>Name]
</div>
Then accessing it's text property gives:
var name = document.getElementById("name").text;
// sets name to "[CustomerName]" without the tags
which should be easy to process. Do a regex search now if you need to.
Edit: Since you're doing this processing on the server-side, process the XML recursively and collect the text element's of each node. Since I'm not big on VB.Net, here's some pseudocode:
getNodeText(node) {
text = ""
for each node.children as child {
if child.type == TextNode {
text += child.text
}
else {
text += getNodeText(child);
}
}
return text
}
myXml = xml.load(<html>);
print getNodeText(myXml);
And then replace or whatever there is to be done!
I have found what I believe is a solution to this issue. Well in my scenario it is working.
The html input has been tweaked to place each custom field or keyword within a div with a set id. I have looped through all of the elements within the html string using mshtml and have set the inner text to the correct value when a match is found.
e.g.
Function ReplaceDetails(ByVal message As String, ByVal customerName As String) As String
Dim returnString As String = String.Empty
Dim doc As IHTMLDocument2 = New HTMLDocument
doc.write(message)
doc.close()
For Each el As IHTMLElement In doc.body.all
If (el.id = "Date") Then
el.innerText = Now.ToShortDateString
End If
If (el.id = "CustomerName") Then
el.innerText = customerName
End If
Next
returnString = doc.body.innerHTML
return returnString
Thanks for all of the input. I'm glad to have a solution to the problem.