how to convert the webpage to xml document using java? - html

The assumption is the webpage is coded with correct tags. How can I Convert it to the XML file? I think the most webpages can be viewed as dom tree...How can I convert it to XML file?

JTidy reads HTML and presents it as a DOM. Once you have your HTML as a DOM you should be able to process it and write it out as XML.
To output a DOM, see the example code here and the XMLSerializer in particular.

Related

XML -> XSL -> HTML edit file, and SAVE changes in xml WITHOUT asp

I have an XML file that I've transformed with xsl and loaded into a browser as html. That html is editable using a rich text editor by the user. When they're done I need to transform their html edits back to the original xml document.
One solution I've found is using ASP: http://www.w3schools.com/xsl/xsl_editxml.asp
But I'm using Apache and I don't have ASP installed, and I'm wondering if there is an easier/better way to do this without using ASP.
Or is ASP the only way?
Thanks =)
The solutiton that you found doesn't do what you describe. It only presents the data from the XML as a form, and lets the user edit the values. That's not very complex, and you can do that using pretty much any other server side language, like PHP for example.
What you describe, on the other hand, is quite complex. It involves examining the XSL and the HTML to identify the parts of the HTML code that was created using specific XML data, so that changes can be reflected back. That's not something that is done with a simple ASP script like that.
If you design an XSL transformation for both directions, XML to HTML and HTML to XML, comparing the source XML and resulting user XML should be a much easier problem to solve.

Generate a xml from a html

Im trying to generate a xml from a html (url). The html website have a formulary that i want to get into a xml archive, but its too long and im searching a way to do it easier.
There is a method to generate a xml with all the fields, etc, from a html?
you can also use an html parser and print out the objects / array as xml
try this: http://sourceforge.net/projects/html2xml/
You can try the free dotnet-classlibrary SgmlReader that can load html into a xmldocument. This in turn can be saved as xml.

Embedding an external XML file in HTML and using an XSL file to parse it?

I have an XML file ("exchange.xml") and an XSL file ("exchange.xsl") for parsing it. exchange.xml has the following line:
<?xml-stylesheet type="text/xsl" href="exchange.xsl"?>
Indicating that exchange.xsl should be used to parse the file.
It parses as expected if I load exchange.xml in a browser. However, I want to have exchange.xml embedded in an HTML file which otherwise does not have any XML in it. How do I do this? As far as I understand it, when you embed XML in HTML with the XML tag's SRC ID, you then have to parse it yourself for it to be displayed. I want it to automatically be displayed using the XSL file I've already created for it.
It has also come to my attention that some browsers (e.g. Android default browser) won't parse the XML with the XSL file, and require a server-side transform. Is it possible to embed my XML into an HTML file but use a server-side transform?
As far as I am aware HTML 4 as specified by the W3C does not provide any way to embed XML markup in an HTML 4 document. And of the browsers I know only IE has an extension to HTML 4 that allows that, the so called XML data islands, http://msdn.microsoft.com/en-us/library/ms766512%28v=VS.85%29.aspx, where IE's HTML parser recognizes an xml element that can contain XML markup as its content or link to it with the src attribute. So unless you want to use something IE specific like an XML data islands the cross-browser way to load XML as data inside of an HTML document is to use XMLHttpRequest with Javascript.
You can use PHP: XSL class for transforming XML and include it via PHP in your page.
eg: showxml.php
<?php
function processXML($file,$styles){
... //example here
}
<html>
...
<div id="xml>
<?php echo processXML('foo.xml','bar.xsl'); ?>
</div>
...
</html>

Extracting HTML tag properties and values

Is it possible to extract the properties and values in a HTML file and export them to a XML file?
Sure.
A HTML File IS a (more or less) a XML file.
You can use XSLT (http://www.w3.org/TR/xslt) for transforming HTML/XML to another HTML/XML.
Or you can use any other language like PHP with xmldom.

How to show XSL-converted XML as a part of an HTML page?

Can I embed an XML file in HTML without using iFrames?
I want to show XSL-transformed XML(which, is HTML as a result of transformation) as a part of my HTML document. Hope this makes it clearer.
If my description of problem is unclear, please tell me and I will try to explain it more.
You can easily use browser based XSL transformation routines to convert an XML string into an XMLDocument or HTML output that can then be applied into any page element.
The steps could be briefly summarized as:
Load an XML string from a resource (or as the result of an AJAX hit).
Load the XML document into an Xml document object (code differs for Browsers - IE uses the ActiveXObject MSXML - DOMDocument, while Mozilla uses the built-in implementation to create a Document. Chrome on the other hand uses the built-in XmlHttpRequest object as the only available XML document object.)
Load the XSL document similarly and set its arguments.
Transform the XML and obtain output as a string.
Apply the string output to any page element.
Note that the code differs for each browser so it may be simpler to use a public JS framework such as JQuery or Prototype.
You will need to use html entities. For example this is how you would write a name tag
<name>.
More reading here