My objective is to convert word document to HTML and if equation presents in a document, then it must be converted into mathml before saving it as HTML.
So I converted all the equation into mathml using mathtype. Now, word document will have mathml content too.
If I save it as "web page filtered" document, I get it as HTML but mathml tags are getting converted to entity.
For example:
sample.docx
<mrow>Lj</mrow>
is converted to
<mrow> &C456;</mrow>
EXPECTATION
I need to convert this document into HTML but if mathml contents present, then that mathml content alone shouldn't be converted into HTML. What would be the solution for this?
I need something like, XML file <![CDATA[ unparsed char ]]> . In word file, how can I shield mathml content not to be converted?
Related
Must SVG foreignObject contain only XML? That seems to be my reading of the spec, so is it required to put XHTML inside, if I wish to use foreignObject? Or can I just copy my regular HTML5 syntaxed code inside?
The contents of any document must reflect the mime type it's served as.
If you serve the document as image/svg+xml then the whole document must be valid XML including any foreignObject child elements which basically limits you to XHTML. XML fails on error so if you paste in HTML you'll likely just get a syntax error printed than have anything rendered.
If you serve the document as text/html then the whole document is HTML and any SVG in it would be parsed with an HTML parser, no namespaces required etc. Any foreignObject elements in such SVG will be parsed as if they were standard HTML. HTML attempts to render what it can of invalid documents.
Looking for a solution where I can generate a MS Word document dynamically with the machine readable data (HTML).
MS Word Template document + HTML file = Output Word document.
E.g.:
Content of Word document = I'm the < > document.
Content of HTML file = <td> example </td>
Output Word document = I'm the example document.
What could be the good methodology or tool to start with?
Open XML SDK using the "altChunk" method to "embed" the HTML file into the WordOpenXML Document "package" at the required "target" point. Word will convert and integrate the HTML into the document when it's opened in Word. See, for example:
http://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx
I am working a project which has requirement to convert specific paragraphs in word document to HTML. I will have the range object of para or paras , from that range I can get WordOpenXML, I want to convert that to HTML. ( it should not have the html, head, body tags as it is not full document but just a small html chunk )
I saw Eric White's open XML articles, he did great articles on this topic and power tools for openxml has html converter which converts entire document to html, my requirement is to convert a specific para or range to HTML. Can any one guide me in right direction.
For example, If a word document has
This is para1.
This is para2.
This is para3.
My requirement is to convert para2, which is available with me as para object. So, basically I am looking to write a function like
public string WordOpenXMLToHtml( string sWordOpenXML) {
// do the transformation
return sHtml;
}
You could try the HtmlConverter object. More info here Transforming Open XML WordprocessingML to XHTML Using the Open XML SDK 2.0
So I have a servlet which prints content of various files. But when I want to print .xml file my servlet page doesn't print anything, because page uses this xml tags as html and is parsing them istead of printing. And I want to print this tags. I am reading file line by line and lines are stored in variable line.
If you want to print xml content in your HTMl page, you can use StringEscapeUtils.escapeHtml() function from Apache commons lang library to write xml file contents to your HTML page
PrintWriter writer = response.getWriter();
writer.write("<html><head></head><body>");
writer.write(StringEscapeUtils.escapeHtml(xmlContent);
writer.write("</body></html>");
If you are attempting to Display XML as content in an HTML document:
Browsers can't tell the difference better a < that the author intends to mean "Start of tag" and one that the author intends to mean "Render this".
You need to represent it as < if you want it to appear as data.
The answer to htmlentities equivalent in JSP? explains how to convert a string of text into a string of HTML.
If you are attempting to Output an XML document instead of an HTML document:
You need to specify an XML content type (such as application/xml) instead of an HTML content-type.
See How to set the content type on the servlet for an explanation.
I have an XML file ("exchange.xml") and an XSL file ("exchange.xsl") for parsing it. exchange.xml has the following line:
<?xml-stylesheet type="text/xsl" href="exchange.xsl"?>
Indicating that exchange.xsl should be used to parse the file.
It parses as expected if I load exchange.xml in a browser. However, I want to have exchange.xml embedded in an HTML file which otherwise does not have any XML in it. How do I do this? As far as I understand it, when you embed XML in HTML with the XML tag's SRC ID, you then have to parse it yourself for it to be displayed. I want it to automatically be displayed using the XSL file I've already created for it.
It has also come to my attention that some browsers (e.g. Android default browser) won't parse the XML with the XSL file, and require a server-side transform. Is it possible to embed my XML into an HTML file but use a server-side transform?
As far as I am aware HTML 4 as specified by the W3C does not provide any way to embed XML markup in an HTML 4 document. And of the browsers I know only IE has an extension to HTML 4 that allows that, the so called XML data islands, http://msdn.microsoft.com/en-us/library/ms766512%28v=VS.85%29.aspx, where IE's HTML parser recognizes an xml element that can contain XML markup as its content or link to it with the src attribute. So unless you want to use something IE specific like an XML data islands the cross-browser way to load XML as data inside of an HTML document is to use XMLHttpRequest with Javascript.
You can use PHP: XSL class for transforming XML and include it via PHP in your page.
eg: showxml.php
<?php
function processXML($file,$styles){
... //example here
}
<html>
...
<div id="xml>
<?php echo processXML('foo.xml','bar.xsl'); ?>
</div>
...
</html>