mail-merge HTML from a database into MS Word - html

project: Using VB.NET to build a winforms database interface and work-automation app.
I am using this editor for the users to enter their text in the database interface environment that will both load/save/show them what they are working on in the form and also mail-merge into a Word document waiting for the content. I can do the first step and it works well, but how do I get MS Word to recognize HTML as formatting instead of just merging in tags and text all as text?
The tool has two relevant properties: one to get just the text (no markup, i.e. no HTML) and one to get the full markup with HTML. Both of these are in text format (which I use for easy storage in the Database).
ideas/directions I can think of:
1) use the clipboard. I can copy/paste the content straight from the editor window to Word and it works great! But loading from a database is significantly different, even when using the clipboard programatically. (maybe I don't understand how to use the clipboard tools)
2) maybe there is a library or class/function in Word that can understand the HTML as "mergable" content?
thanks!
:-Dan

You may use our (SautinSoft) .Net library to transform each of your HTML data to Word document.
Next you may merge all produced Word documents into single Word document. The component also have function to merge Word documents.
This is link download the component: http://www.sautinsoft.com/products/html-to-rtf/download.php
This is a sample code to transform HTML to Word document in memory:
Dim h As New SautinSoft.HtmlToRtf
Dim rtfString As String = ""
rtfString = h.ConvertString(htmlString)
This is a sample code to merge two documents in memory:
Dim h As New SautinSoft.HtmlToRtf
Dim rtfSingle As String = ""
rtfSingle = h.MergeRtfString(rtf1, rtf2)

I ended up using the clipboard to set the text. Here is a code sample that I needed to answer this question.
Clipboard.SetText(Me._Object.Property, TextDataFormat.Rtf)
I just didn't know how to tell the computer that the content was HTML or RTF etc. It turned out to be simple.
:-Dan

Related

All paragraphs are empty in an opened document in python-docx

I do the following:
from docx import Document
document = Document('text.docx')
document.paragraphs[42].text
And it gives me '' whatever number I enter, and for loop to find and replace a word does not work. But if I save the document with document.save('text2.docx'), the document is not empty.
The document is relatively big and contains many different formatting, images, tables, styles.
My task is to find and replace a word in docx document with some correction of the following word, so I will be glad, if you suggest another tool
I ran into this problem and was able to read the document using docx2txt: https://pypi.org/project/docx2txt/

How to detect HTML in clipboard data using Qt

I have a rich text editor I'm working on where I need to parse and clean data from the clipboard when appropriate. Whenever the text being pasted contains HTML, I will clean it up and update the text field with the correct html.
However, when there is no html in the clipboard, there is no need for me to run the html cleaning tool.
My first thought was to use Regex and check for any html tag in there, but I'm not sure this is the best solution for this problem as it can cause more headaches in the long run with false positives, etc.
My question is, how can I detect some HTML in the clipboard?
Is there a an elegant way to solve this problem without having to resort to Regex?
may be one of these functions:
bool QDomDocument::setContent(...)
This function reads the XML document from the string text, returning true if the content was successfully parsed; otherwise returns false. Since text is already a Unicode string, no encoding detection is done
Addition for a clipboard's mixed data:
// get a html data from a junk
QString htmlText = cliboardString.section("</html>", -2, 0,QString::SectionIncludeTrailingSep)
.section("<html", 1,-1,String::SectionIncludeLeadingSep);
// check for a validness, correctness etc.
if( !htmlText.isEmpty() ) {
QDomDocument::setContent(htmlText,...
}

How can I import html content to pdf template?

I created a pdf template with open office draw. it has textboxes and I can set values with acrofield. But I can't import a html content to template.
I can convert html contents to pdf file; but for template, how can I do it?
My problem is with template; also my html content have to map on page, for example center of page.
Thanks
I am not quite sure if I understand your question, but it seems like you need some kind of template where you will enter your content.
My thinking goes to OpenXML as the best fit. But since it is rather complex you can save some time by using third party tools.
From my experience, Docentric gives you good value for the money. You can prepare a template in Word and then merge it with data from any source that can fit into .NET object. Your document can be converted to pdf or xps if required.
Templates are generated in MS Word (2007 or newer) using special Docentric Add-in for template generation. All MS Word formatting can be applied here. Placeholders for data are set where the data will appear at runtime.
The process is straight forward so even end users can design reports. Developers then focus on bringing data in from various sources (database, XML). Chech the product documentation for ideas how to use it.

Using Ruby on Rails to write to an Excel spreadsheet AND include html markup in the text

I have a Ruby on Rails web application in which the user clicks on a link which produces a spreadsheet.
It was easy enough to do this. What I haven't been able to do is get it to write text in the cells formatted according to html tags.
book = Spreadsheet::Workbook.new
sheet = book.create_worksheet :name => "My worksheet"
sheet[0,0] = "<strong style="color:red">I want this to appear as red</strong>"
And I get it that you can use the Spreadsheet:Format.new object to set the format for a cell or a row - but in this case I won't know this ahead of time; I need for the spreadsheet to automatically interpret html tags as the text is sucked in from a database.
Any suggestions?
Thanks in advance,
Tim
I am not sure, whether these HTML tags can directly be converted to styles in excel sheets. I faced a few problems when dealing with rails and microsoft office outputs.
Check out this blog, this might help you. http://axlsx.blog.randym.net/2011/12/axlsx-making-excel-reports-with-ruby-on.html
so maybe, you can create a function.. Parse your styling tags, and then accordingly convert them to the excel supported params mentioned in the blog

Source text contains simple HTML. How can I simply format the text in MS Word?

I've inherited a project that stores basic HTML formatting (i.e. - <b>, <i> tags) in a database and writes it out to a Word document. This is my first Word automation assignment, so be gentle!
Currently, there is a complicated function that runs after the document is complete that searches and replaces these tags. However, as this is run after the document is complete, any logic that is determined at run time (i.e. - insert page break here) can lead to disastrous results. For example, if I have a large chunk of bolded text, this bold text takes up more space and pushes the line break down to the next page, resulting in a mostly blank page.
I believe the fix for this is to format the text as it comes from the database so the positioning logic will be correct. I don't want to call the complicated procedure multiple times as it is time consuming and our end users need this document as quickly as possible.
Is there an easy way to write HTML formatted text to a Word document without needing to find and replace every supported tag? I would think that there would be something within Word that could handle this automatically. Thanks in advance if you can point me in the right direction.
Try this:
First, save the HTML you are about to insert as an ordinary ".htm" file.
Then use the Range object and it's InsertFile method to insert the ".htm" file at any given position:
Dim r As Range
Set r = ActiveDocument.Range
r.InsertFile FileName:=TempFilePath, Link:=False, ConfirmConversions:=False
Word should be smart enough to handle the HTML and do all of the format conversion on it's own. Use CSS to control the finer parts of the formatting.
Delete the ".htm" file when done.
maybe you can invoke an embedded IE (IWebBrowser2) to layout the text, then copy to clipboard as richtext, and finally paste to Word as RichText (formatted).