How to insert html in Microsoft Word Placeholder - html

I have this situation:
The user has an editor on his page and he enters text(with colors, formating, hyperlinks and he can also add pictures). When he clicks Submit the data from the editor(with the proper formating) must be sent to a specific placeholder in a Microsoft Office Word document.
I am using OpenXml SDK to write in the document and I tried HtmlToOpenXml so I can read the html.
I use HtmlToOpenXml and from the html string(from the user) I det a couple of paragraphs and now I have to insert them in the content control. Do you know how can I find the control and append them in it(if possible)

I managed to fix this and here is the code I used
//name of the file which will be saved
const string filename = "test.docx";
//html string to be inserted and rendered in the word document
string html = #"<b>Test</b>";
//the Html2OpenXML dll supports all the common html tags
//open the template document with the content controls in it(in my case I used Richtext Field Content Control)
byte[] byteArray = File.ReadAllBytes("..."); // template path
using (MemoryStream generatedDocument = new MemoryStream())
{
generatedDocument.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(generatedDocument, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
//just in case
if (mainPart == null)
{
mainPart = doc.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
Body body = mainPart.Document.Body;
//sdtElement is the Content Control we need.
//Html is the name of the placeholder we are looking for
SdtElement sdtElement = doc.MainDocumentPart.Document.Descendants<SdtElement>()
.Where(
element =>
element.SdtProperties.GetFirstChild<SdtAlias>() != null &&
element.SdtProperties.GetFirstChild<SdtAlias>().Val == "Html").FirstOrDefault();
//the HtmlConverter returns a set of paragraphs.
//in them we have the data which we want to insert in the document with it's formating
//After that we just need to append all paragraphs to the Content Control and save the document
var paragraphs = converter.Parse(html);
for (int i = 0; i < paragraphs.Count; i++)
{
sdtElement.Append(paragraphs[i]);
}
mainPart.Document.Save();
}
File.WriteAllBytes(filename, generatedDocument.ToArray());
}

Related

HTML.TextAreaFor - removing html tags for display only

In an MVC application I have to use #HTML.TextAreaFor to display some text from a database, the trouble is sometimes that text may have HTML tags within it and I can't see a way to remove those for display only.
Is it possible to do this in the view (maybe with CSS?) without having to strip the tags in the controller first?
EDIT
The data coming from the controller contains html tags which I do not want to remove, I just don't want to display them
Normally I would use #HTML.Raw but it has to work in a #HTML.TextAreaFor control.
If you want to decode Html returned from the Controller you can use the following JavaScript method:
This method decodes "Chris&apos; corner" to "Chris' corner".
var decodeEntities = (function () {
// this prevents any overhead from creating the object each time
var element = document.createElement('div');
function decodeHTMLEntities(str) {
if (str && typeof str === 'string') {
// strip script/html tags
str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '');
element.innerHTML = str;
str = element.textContent;
element.textContent = '';
}
return str;
}
return decodeHTMLEntities;
})();
You can do this by using a razor code in your view.
#Html.Raw(HttpUtility.HtmlDecode(Model.Content))
if I set Model.Content to this string "<strong>This is me</strong><button>click</button>", the code above will render it like HTML code and will have a strong text next to a button as an output like the image below:
There's some nice rich text editors libraries like CK Editor, Quill, or TinyMCE that can display HTML while still maintaining the editor capabilities of being a text editor. All of these libraries have capabilities of being read-only as well if that's necessary.
Example from Quill -
Sorted this by changing TextAreaFor toTextBoxFor and setting a formatted value.
#Html.TextBoxFor(x => Model.MyItem, new { #class = "form-control", #required = "true", Value = Regex.Replace(Model.MyItem, "<.*?>", String.Empty) })

Preserve highlight when copy from Ace Editor

I am using Ace Editor in my web app. Wonder if it's possible to copy the text inside Ace Editor to clipboard with highlight. With default configurations, if I copy the selected text in Ace Editor to clipboard, it seems that only text content is copied with no html styles.
Thanks a lot for your help!
Unfortunately there is no api for this. you'll need to modify https://github.com/ajaxorg/ace/blob/v1.2.5/lib/ace/keyboard/textinput.js#L290 to also set text/html mime type to some html, rendered similar to https://github.com/ajaxorg/ace/blob/v1.2.5/lib/ace/layer/text.js#L442.
Also you'll need to include the css for the theme in the copied html
I know this is late, but this might be helpful for someone like me who stumbled upon this problem this late.
The basic idea is to get the text that is being copied and use Ace's tokenizer to generate HTML from it:
Add an event listener on the copy/cut event on the editor's container.
You can use clipboard object in event to get the data currently being copied: event.clipboardData?.getData('text/plain')
Rest steps are in code below
// get current tokenizer
const tokenizer = aceSession.getMode().getTokenizer();
// get `Text` object from ace , this will help in generating HTML
const Text = ace.require('ace/layer/text').Text;
// create a wrapper div, all your resultant HTML will come inside this
// also this will contain the basic HTML required to initialize the editor
const root = document.createElement('div');
// this is the main magic object
const rootText = new Text(root);
lines.forEach(line => {
// this converts your text to tokens
const tokens = tokenizer.getLineTokens(line, 'start') as any;
const leadingSpacesCount = (line.match(/^\s*/) || [])[0].length;
const lineGroupEl = document.createElement('div');
lineGroupEl.className = 'ace_line_group';
const lineEl = document.createElement('div');
lineEl.className = 'ace_line';
const spaceSpan = document.createElement('span');
if (tokens && tokens.tokens.length) {
//////////////////////////////////////////////////////
// Magic Happens here, this line is responsible for converting our tokens to HTML elements
rootText.$renderSimpleLine(lineEl, tokens.tokens);
//////////////////////////////////////////////////////
// Leading spaces do not get translated to HTML, add them separately
spaceSpan.innerHTML = ' '.repeat(leadingSpacesCount);
lineEl.insertBefore(spaceSpan, lineEl.children[0]);
} else {
spaceSpan.innerHTML = ' ';
lineEl.appendChild(spaceSpan);
}
lineGroupEl.appendChild(lineEl);
// `root` has a wrapper div, inside which our main div (with class "ace_layer ace_text-layer") lies
root.children[0].appendChild(lineGroupEl);
});
return root.innerHTML;
Now finally, in your eventlistener you can wrap this with any div to give your own custom color to it, and put it to clipboardData with text\html mime type:
event.clipboardData?.setData('text/html', htmlContent);

iTextSharp support for HTML controls conversion C#

Does iTextSharp HTML to PDF conversion support controls like Textboxes, Buttons etc.? Or we need to use iTextSharp class like TextField to implement controls during PDF conversion.
iTextSharp doesn't support convertion of text boxes, buttons, etc. Most likely, you need to implement your own logic if you want to covert the html page (with text boxes, buttons, etc.) to a pdf document. You can find all supported tags and styles here. You can also check whether an element is supported using this simple example:
byte[] bytes;
using (var stream = new MemoryStream())
{
using (var document = new Document())
{
using (var writer = PdfWriter.GetInstance(document, stream))
{
document.Open();
var html = #"<p>Before the button</p><br/><input type=""submit"" value=""Click me""/><br/><p>After the button</p>";
using (var reader = new StringReader(html))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, reader);
}
document.Close();
}
}
bytes = stream.ToArray();
}
File.WriteAllBytes("test.pdf", bytes);
If you run this example, you'll see that the input element is not a part of the final document:

Can use of HtmlAgilityPack be modified to only extract main part of HTML document?

I have some .NET code that ingests HTML files and extracts text from them. I am using HtmlAgilityPack to do the extraction. Before I wanted to extract most of the text that was there that was there, so it worked fine. Now requirements have changed and I need to only extract text from he main body of the document. So suppose I scraped HTML from a news webpage. I just want the text of the article, not the ads, titles of other albeit related articles, header/footers etc.
It is possible to modify my calls to HtmlAgilityPack to only extract the main text? Or is there an alternative way to do the extraction?
Here's the current block of code that gets text from HTML:
using HtmlAgilityPack;
public string ConvertHtml(string html)
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
StringWriter sw = new StringWriter();
ConvertTo(doc.DocumentNode, sw);
sw.Flush();
return sw.ToString();
}
public void ConvertTo(HtmlNode node, TextWriter outText)
{
string html;
switch (node.NodeType)
{
case HtmlNodeType.Comment:
// don't output comments
break;
case HtmlNodeType.Document:
ConvertContentTo(node, outText);
break;
case HtmlNodeType.Text:
// script and style must not be output
string parentName = node.ParentNode.Name;
if ((parentName == "script") || (parentName == "style"))
break;
// get text
html = ((HtmlTextNode) node).Text;
// is it in fact a special closing node output as text?
if (HtmlNode.IsOverlappedClosingElement(html))
break;
// check the text is meaningful and not a bunch of whitespaces
if (html.Trim().Length > 0)
{
outText.Write(HtmlEntity.DeEntitize(html));
}
break;
case HtmlNodeType.Element:
switch (node.Name)
{
case "p":
// treat paragraphs as crlf
outText.Write("\r\n");
break;
}
if (node.HasChildNodes)
{
ConvertContentTo(node, outText);
}
break;
}
}
private void ConvertContentTo(HtmlNode node, TextWriter outText)
{
foreach (HtmlNode subnode in node.ChildNodes)
{
ConvertTo(subnode, outText);
}
}
So, ideally, what I want is to let HtmlAgilityPack determine which parts of the input HTML constitute the "main" text block and input only those elements. I do not know what the structure of input HTML will be but I do know that it will vary a lot (before it was a lot more static)

WPF RichTextBox with Inline HTML - i18n

I have a RichTextBox which I'm trying to use to display a translateable block of text containing hyperlinks. The problem I'm having is I can't find a way to set the text property without manually coding the s and controls into the content, which isn't translateable. Is there any way of doing this? I tried saving a simple RTF file containing one sentence using Word so I could extract the bits I need, but I end up with 160 lines of difficult to decipher RTF text.
Ideally HTML would be easier but this doesn't seem to be supported
I solved this by using the http://htmlagilitypack.codeplex.com/ to parse out the anchors.
public static IEnumerable<Inline> ParseHtml(string text)
{
var inlines = new List<Inline>();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(text);
if(doc.ParseErrors==null ||!doc.ParseErrors.Any()) {
foreach (var childNode in doc.DocumentNode.ChildNodes) {
switch(childNode.Name.ToLowerInvariant()) {
case "a":
var lnk = new Hyperlink(new Run(childNode.InnerText));
lnk.NavigateUri = new Uri(childNode.Attributes["href"].Value);
inlines.Add(lnk);
break;
default:
inlines.Add(new Run(childNode.InnerText));
break;
}
}
}
return inlines;
}