So I have a servlet which prints content of various files. But when I want to print .xml file my servlet page doesn't print anything, because page uses this xml tags as html and is parsing them istead of printing. And I want to print this tags. I am reading file line by line and lines are stored in variable line.
If you want to print xml content in your HTMl page, you can use StringEscapeUtils.escapeHtml() function from Apache commons lang library to write xml file contents to your HTML page
PrintWriter writer = response.getWriter();
writer.write("<html><head></head><body>");
writer.write(StringEscapeUtils.escapeHtml(xmlContent);
writer.write("</body></html>");
If you are attempting to Display XML as content in an HTML document:
Browsers can't tell the difference better a < that the author intends to mean "Start of tag" and one that the author intends to mean "Render this".
You need to represent it as < if you want it to appear as data.
The answer to htmlentities equivalent in JSP? explains how to convert a string of text into a string of HTML.
If you are attempting to Output an XML document instead of an HTML document:
You need to specify an XML content type (such as application/xml) instead of an HTML content-type.
See How to set the content type on the servlet for an explanation.
Related
Like this question, extract text from xml tags in an XML file using apach tika parser
I want to extract all text from text based files, including tagged content, the tags themselves, and other text in XML/HTML elements.
I've tried XML (application/xml), and HTML (text/html) and seen that AutoDetectParser returns less than the full text content.
I've also tried YAML (text/plain), and JSON (text/plain) which do return the full text content.
I understand that I can't do XML or HTML using the AutoDetectParser. What I can't find documented is a list of what types of files would need special handling.
To get full text content (even if that means a complete 'raw' copy of the file):
1. What Mimetypes should be parsed using a TXTParser?
2. What Mimetypes should be parsed using other parsers?
Basically, I'm asking what Mimetypes does the AutoDetectParser return less than the full text content?
Thanks
EDIT
My use case is to be able to extract text and metadata from a wide variety of input file formats including txt, xml, html, doc(x), ppt(x), pdf, ...
Essentially, I want to be able to handle any file type Tika can handle.
I am using code like this
AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();
context.set(Parser.class, parser);
try (InputStream stream = new FileInputStream(fileToExtract)){
parser.parse(stream, handler, metadata, context);
} catch ... {
}
I see the same results for XML files as the question referenced above.
What I am trying to find out is: where is it documented when the combination of AutoDetectParser and BodyContentHandler will return less than the full text of the input file.
When, or for what Mimetypes, do I need to switch the Parser and/or ContentHandler?
I don't see this information clearly documented, and I am hoping to avoid a trail and error approach.
I'm making an api call to a site using the code as shown below :
$xmlData = file_get_contents("http://isbndb.com/api/books.xml?access_key=XXXXXX&index1=isbn&value1=0596002068");
echo $xmlData;
However xmlData when displayed on the browser is auto parsed to HTML. For e.g. The element <title> of the returned XML which is actually a book title is converted to HTML essentially becoming the page title, and the other XML elements are displayed as plain text without the tags. I want the client side XMLHttpRequest Object to get raw XML data from the server side.
Why does this happen and how do I ensure that XML is not auto parsed?
PHP just sees it as text. For instance, do echo "<b>Bold</b>"; and it will "automatically" be in bold. It is the browser that processes the HTML and renders it.
This is what htmlspecialchars is for.
This got nothing to do with php. you spit out elements which browser interprets as HTML (that's why it sets title). Build your html page right, use <pre> tags around your content, or. when needed, send your content with correct content-type header (like text/plain to display your xml for viewing or text/xml for other purposes) so it will not parse your data as html.
Im trying to generate a xml from a html (url). The html website have a formulary that i want to get into a xml archive, but its too long and im searching a way to do it easier.
There is a method to generate a xml with all the fields, etc, from a html?
you can also use an html parser and print out the objects / array as xml
try this: http://sourceforge.net/projects/html2xml/
You can try the free dotnet-classlibrary SgmlReader that can load html into a xmldocument. This in turn can be saved as xml.
I have an XML file ("exchange.xml") and an XSL file ("exchange.xsl") for parsing it. exchange.xml has the following line:
<?xml-stylesheet type="text/xsl" href="exchange.xsl"?>
Indicating that exchange.xsl should be used to parse the file.
It parses as expected if I load exchange.xml in a browser. However, I want to have exchange.xml embedded in an HTML file which otherwise does not have any XML in it. How do I do this? As far as I understand it, when you embed XML in HTML with the XML tag's SRC ID, you then have to parse it yourself for it to be displayed. I want it to automatically be displayed using the XSL file I've already created for it.
It has also come to my attention that some browsers (e.g. Android default browser) won't parse the XML with the XSL file, and require a server-side transform. Is it possible to embed my XML into an HTML file but use a server-side transform?
As far as I am aware HTML 4 as specified by the W3C does not provide any way to embed XML markup in an HTML 4 document. And of the browsers I know only IE has an extension to HTML 4 that allows that, the so called XML data islands, http://msdn.microsoft.com/en-us/library/ms766512%28v=VS.85%29.aspx, where IE's HTML parser recognizes an xml element that can contain XML markup as its content or link to it with the src attribute. So unless you want to use something IE specific like an XML data islands the cross-browser way to load XML as data inside of an HTML document is to use XMLHttpRequest with Javascript.
You can use PHP: XSL class for transforming XML and include it via PHP in your page.
eg: showxml.php
<?php
function processXML($file,$styles){
... //example here
}
<html>
...
<div id="xml>
<?php echo processXML('foo.xml','bar.xsl'); ?>
</div>
...
</html>
Can I embed an XML file in HTML without using iFrames?
I want to show XSL-transformed XML(which, is HTML as a result of transformation) as a part of my HTML document. Hope this makes it clearer.
If my description of problem is unclear, please tell me and I will try to explain it more.
You can easily use browser based XSL transformation routines to convert an XML string into an XMLDocument or HTML output that can then be applied into any page element.
The steps could be briefly summarized as:
Load an XML string from a resource (or as the result of an AJAX hit).
Load the XML document into an Xml document object (code differs for Browsers - IE uses the ActiveXObject MSXML - DOMDocument, while Mozilla uses the built-in implementation to create a Document. Chrome on the other hand uses the built-in XmlHttpRequest object as the only available XML document object.)
Load the XSL document similarly and set its arguments.
Transform the XML and obtain output as a string.
Apply the string output to any page element.
Note that the code differs for each browser so it may be simpler to use a public JS framework such as JQuery or Prototype.
You will need to use html entities. For example this is how you would write a name tag
<name>.
More reading here