replace keyword within html string - html

I am looking for a way to replace keywords within a html string with a variable. At the moment i am using the following example.
returnString = Replace(message, "[CustomerName]", customerName, CompareMethod.Text)
The above will work fine if the html block is spread fully across the keyword.
eg.
<b>[CustomerName]</b>
However if the formatting of the keyword is split throughout the word, the string is not found and thus not replaced.
e.g.
<b>[Customer</b>Name]
The formatting of the string is out of my control and isn't foolproof. With this in mind what is the best approach to find a keyword within a html string?

Try using Regex expression. Create your expressions here, I used this and it works well.
http://regex-test.com/validate/javascript/js_match

Use the text property instead of innerHTML if you're using javascript to access the content. That should remove all tags from the content, you give back a clean text representation of the customer's name.
For example, if the content looks like this:
<div id="name">
<b>[Customer</b>Name]
</div>
Then accessing it's text property gives:
var name = document.getElementById("name").text;
// sets name to "[CustomerName]" without the tags
which should be easy to process. Do a regex search now if you need to.
Edit: Since you're doing this processing on the server-side, process the XML recursively and collect the text element's of each node. Since I'm not big on VB.Net, here's some pseudocode:
getNodeText(node) {
text = ""
for each node.children as child {
if child.type == TextNode {
text += child.text
}
else {
text += getNodeText(child);
}
}
return text
}
myXml = xml.load(<html>);
print getNodeText(myXml);
And then replace or whatever there is to be done!

I have found what I believe is a solution to this issue. Well in my scenario it is working.
The html input has been tweaked to place each custom field or keyword within a div with a set id. I have looped through all of the elements within the html string using mshtml and have set the inner text to the correct value when a match is found.
e.g.
Function ReplaceDetails(ByVal message As String, ByVal customerName As String) As String
Dim returnString As String = String.Empty
Dim doc As IHTMLDocument2 = New HTMLDocument
doc.write(message)
doc.close()
For Each el As IHTMLElement In doc.body.all
If (el.id = "Date") Then
el.innerText = Now.ToShortDateString
End If
If (el.id = "CustomerName") Then
el.innerText = customerName
End If
Next
returnString = doc.body.innerHTML
return returnString
Thanks for all of the input. I'm glad to have a solution to the problem.

Related

How to Remove a specific Img tag from string

Using visual basic
I have a string that contains HTML inside of it. There may be many img tags inside of it, but there is an img tag with a specific alt attribute that I want to remove.
How do I remove the entire img tag from the string if it contains 'badImage' as the alt attribute? I still want to keep any other img tags that may be inside the string.
Dim myString as string = "<html><body><span>some text here..</span><img src='#' alt='goodImage'/><span>more text...</span><img src='#' alt='badImage'/></body></html>
I have the following code so far, but it removes ALL img tags from the string, whereas I only want to remove the img tag with the 'badImage' alt attribute. Is this possible?
Dim imgRegex As New Regex("<img[^>]*>", RegexOptions.IgnoreCase)
myString = myString.Replace(bodyContent, "")
Please answer in VB.Net. Thanks for any assistance!
Hoping that the html source is a well-formatted html/xml/[whatever markup language], you can remove bad tags by using XmlDocument to read your source then remove bad elements detect them by “alt” attribute.
A little code demonstration:
Function ClearBadImgTags(source As String) As String
Dim xDoc As XmlDocument = New XmlDocument
Try
xDoc.LoadXml(source)
Dim badImgs As IEnumerable(Of XmlElement) = From el In xDoc.GetElementsByTagName("img")
Select img = CType(el, XmlElement)
Where img.HasAttribute("alt") AndAlso img.Attributes("alt").Value = "badImage"
For i As Integer = 0 To badImgs.Count - 1 : badImgs(i).ParentNode.RemoveChild(badImgs(i)) : Next
Return xDoc.OuterXml
Catch ex As Exception
Stop 'Bad XML or something go wrong
End Try
Return ""
End Function
Then:
Dim myString As String = "<html><body><span>some text here..</span><img src='#' alt='goodImage'/><span>more text...</span><img src='#' alt='badImage'/></body></html>"
Dim newString As String = ClearBadImgTags(myString)

Libgdx: How to show HTML text in a label?

I have a string like this:
"noun<br> an expression of greeting <br>- every morning they exchanged polite hellos<br> <font color=dodgerblue> ••</font> Syn: hullo, hi, howdy, how-do-you-do<be>"
want to show it in a label as a rich text. for example Instead of <br> tags, text must go to the next line.
in Android we can do that with:
Html.fromHtml(myHtmlString)
but I don't know how to do it in libgdx.
I try to use Jsoup but it removes all tags and does not go to the next line for <br> tag for example.
Jsoup.parse(myHtmlString).text()
Jsoup.parse returns a document containing many elements -of- strings. Not a single string so you are only seeing the first bit. You can assemble the complete string yourself by going through the elements or try
Document doc = Jsoup.parse(yourHtmlInput);
String htmlString = doc.toString();
String htmlText = "<p>This is an <strong>Example</strong></p>";
//this will convert your HTML text into normal text
String normalText = Jsoup.parse(htmlText).text();
in kotlin i use this code:
var definition = "my html string"
definition = definition.replace("<br>", "\n")
definition = definition.replace("<[^>]*>".toRegex(), "")

LibreOffice Basic Macro command converting Calc cellRange to RTF/HTML

My goal is to fill a LibreOffice calc sheet, and silently send a cell range by email when the user clicks the send-off button (and once more to confirm).
So there is three part to this.
A push button with a request to confirm. (Easy and done.)
Select Cell Range and turn it into rich text format (Haven't yet found)
Send rich text email from within the sheet. (Will tackle the "silent" part later)
I tried copying the range to the clipboard with unoService but it seemed over-complicated and full of errors.
Here's what I have:
''''Send by e-mail enriched text
Sub Main
Dim Doc, Sheet, Range, Rtf, Exec as Object
End Sub
'Confirm it
Sub SendTableApproval
If MsgBox ("Ready to email?", MB_YESNO + MB_DEFBUTTON2) = IDYES Then
CopyTable()
End If
End Sub
'Copy it
Sub CopyTable
Doc = ThisComponent
View = Doc.CurrentController
Frame = View.Frame
Sheet = Doc.Sheets.getByIndex(0)
Range = Sheet.getCellrangeByName("a1:f45")
Exec = createUnoService("com.sun.star.frame.DispatchHelper")
View.Select(Range)
Cells = View.getTransferable()
Exec.executeDispatch(Frame, ".uno:Deselect", "", 0, array())
'SimpleMailTo(Cells)
End Sub
'Mail it
Sub SimpleMailTo(body)
Dim launcher as object
Dim eAddress, eSubject, eBody, aHTMLanchor as string
launcher = CreateUnoService("com.sun.star.system.SystemShellExecute")
eAddress = "tu#domo.eg"
eSubject = "Cotidie agenda futuendane"
eBody = body
aHTMLanchor = "mailto:" & eAddress & "?subject=" & eSubject & "&&body=" & eBody
launcher.execute(aHTMLanchor, "", 0)
End Sub
I still do not know after three days of research over methods, properties, uno.
My question is, simply put, How can I convert a transferable content to HTML/RTF?
Simply copying and pasting into an email produces the result you are asking for. The code on the LibreOffice side should look like this.
dispatcher.executeDispatch(document, ".uno:Copy", "", 0, Array())
It sounds like you already tried this, but something didn't work. Perhaps you could elaborate on what went wrong.
Another approach would be to write the spreadsheet to a temporary HTML or XHTML file. Then parse the temporary file to grab the part needed for the email.
AFAIK there is no such command to turn a cell range into rich text format with UNO. To do it that way, you would need to loop through each text range of each cell, read its formatting properties and then generate the HTML yourself.
EDIT:
Good idea about XTransferable. The following Java code adapted from the DevGuide gets an HTML string and then prints it. I believe this would be a good solution for your needs.
public void displayHTMLFromClipboard()
{
try
{
Object oClipboard = xMCF.createInstanceWithContext(
"com.sun.star.datatransfer.clipboard.SystemClipboard", xContext);
XClipboard xClipboard = (XClipboard)
UnoRuntime.queryInterface(XClipboard.class, oClipboard);
XTransferable xTransferable = xClipboard.getContents();
DataFlavor[] aDflvArr = xTransferable.getTransferDataFlavors();
System.out.println("Available clipboard formats:");
DataFlavor aChosenFlv = null;
for (int i=0;i<aDflvArr.length;i++)
{
System.out.println(
"MimeType: " + aDflvArr[i].MimeType +
" HumanPresentableName: " + aDflvArr[i].HumanPresentableName );
if (aDflvArr[i].MimeType.equals("text/html"))
{
aChosenFlv = aDflvArr[i];
}
}
System.out.println("");
try
{
if (aChosenFlv != null)
{
System.out.println("HTML text on the clipboard...");
Object aData = xTransferable.getTransferData(aChosenFlv);
String s = new String((byte[])aData, Charset.forName("ISO-8859-1"));
System.out.println(s);
}
}
catch (UnsupportedFlavorException exc)
{
exc.printStackTrace();
}
}
catch(com.sun.star.uno.Exception exc)
{
exc.printStackTrace();
}
}
If you plan to use Basic, it might be a good idea to do some research into the proper way to convert bytes. The code I have below seems to work but is probably unreliable and unsafe, and will not work for many languages. A few of my initial attempts crashed before this finally worked.
Sub DisplayClipboardData
oClipboard = createUnoService("com.sun.star.datatransfer.clipboard.SystemClipboard")
xTransferable = oClipboard.getContents()
aDflvArr = xTransferable.getTransferDataFlavors()
For i = LBound(aDflvArr) To UBound(aDflvArr)
If aDflvArr(i).MimeType = "text/html" Then
Dim aData() As Byte
aData = xTransferable.getTransferData(aDflvArr(i))
Dim s As String
For j = LBound(aData) to UBound(aData)
s = s & Chr(aData(j)) 'XXX: Probably a bad way to do this!
Next j
Print(s)
End If
Next
End Sub
One more suggestion: Python might be a better language choice here. In many ways, using Python with LibreOffice is easier than Java. And unlike Basic, Python is powerful enough to comfortably handle byte strings.

Loop Through HTML Elements and Nodes

I'm working on an HTML page highlighter project but ran into problems when a search term is a name of an HTML tag metadata or a class/ID name; eg if search terms are "media OR class OR content" then my find and replace would do this:
<link href="/css/DocHighlighter.css" <span style='background-color:yellow;font-weight:bold;'>media</span>="all" rel="stylesheet" type="text/css">
<div <span style='background-color:yellow;font-weight:bold;'>class</span>="container">
I'm using Lucene for highlighting and my current code (sort of):
InputStreamReader xmlReader = new INputStreamReader(xmlConn.getInputStream(), "UTF-8");
if (searchTerms!=null && searchTerms!="") {
QueryScorer qryScore = new QueryScorer(qp.parse(searchTerms));
Highlighter hl = new Highlighter(new SimpleHTMLFormatter(hlStart, hlEnd), qryScore);
}
if (xmlReader!=null) {
BufferedReader br = new BufferedReader(xmlReader);
String inputLine;
while((inputLine = br.readLine())!=null) {
String tmp = inputLine.trim();
StringReader strReader = new stringReader(tmp);
HTMLStripCharFilter htm = HTMLStripCharFilter(strReader.markSupported() ? strReader : new BufferedReader(strReader));
String tHL = hl.getBestFragment(analyzer, "", htm);
tmp = (tHL==null ? tmp : tHL);
}
xmlDoc+=tmp;
}
bufferedReader.close()
As you can see (if you understand Lucene highlighting) this does an indiscriminate find/replace. Since my document will be HTML and the search terms are dictated by users there is no way for me to parse on certain elements or tags. Also, since the find/replace basically loops and appends the HTML to a string (the return type of the method) I have to keep all HTML tags and values in place and order. I've tried using Jsoup to loop through the page but handles the HTML tag as one big result. I also tried tag soup to remove the broken HTML caused by the problem but it doesn't work correctly. Does anyone know how to basically loop though the elements and node (data value) of html?
I've been having the most luck with this
StringBuilder sb = new StringBuilder();
sb.append("<?xml version=\"1.0\" enconding=\"UTF-8\"?><!DOCTYPE html>");
Document doc = Jsoup.parse(txt.getResult());
Element elements = doc.getAllElements();
for (Element e : elements) {
if (!(e.tagName().equalsIgnoreCase("#root"))) {
sb.append("<" + e.tagName() + e.attributes() + ">" + e.ownText() + "\n");
}// end if
}// end for
return sb;
The one snag I still get is the nesting isn't always "repaired" properly but still semi close. I'm working more on this.

HTML Agility Pack - Get Page Summary

How would I use the HTML Agility Pack to get the First Paragraph of text from the body of an HTML file. I'm building a DIGG style link submission tool, and want to get the title and the first paragraph of text. Title is easy, any suggestions for how I might get the first paragraph of text from the body? I guess it could be within P or DIV depending on the page.
Is this html that you control? If so, you could give the p an id or a class and find it via
//p[#id=\"YOUR ID\"] or //p[#class=\"YOUR CLASS\"]
EDIT:
Since you don't control the html, maybe the below will work. It takes all the HtmlTextNodes and tries to find a grouping of text greater than the threshold specified. It's far from perfect but might get you going in the right direction.
String summary = FindSummary(page.DocumentNode);
private const int THRESHOLD = 50;
private String FindSummary(HtmlAgilityPack.HtmlNode node) {
foreach (HtmlAgilityPack.HtmlNode childNode in node.ChildNodes) {
if (childNode.GetType() == typeof(HtmlAgilityPack.HtmlTextNode)) {
if (childNode.InnerText.Length >= THRESHOLD) {
return childNode.InnerText;
}
}
String summary = FindSummary(childNode);
if (summary.Length >= THRESHOLD) {
return summary;
}
}
return String.Empty;
}
The agility pack uses xpath for querying the html load you just use a simple xpath statement. Something like...
HtmlDocument htmldoc = new HtmlDocument();
htmldoc.LoadHtml(content);
HtmlNodeCollection firstParagraph = htmldoc.DocumentNode.SelectNodes("//p[1]");