Extracting HTML tag properties and values - html

Is it possible to extract the properties and values in a HTML file and export them to a XML file?

Sure.
A HTML File IS a (more or less) a XML file.
You can use XSLT (http://www.w3.org/TR/xslt) for transforming HTML/XML to another HTML/XML.
Or you can use any other language like PHP with xmldom.

Related

Apply an XSL Transform to an XML REST result without being able to specify the XSL file path in the XML

I have a simple XSLT file that I would like to apply to an XML file retrieved in browser to convert it to HTML and make it nicely presented. However I am unable to modify the xml before hand to specify the XSL style.
Is there a way I can force the XSL to be applied to the XML file without being able to the specify the XSL style on the source file itself?
Do I need some kind of 'proxy' html file to host the source url and transform? How would I go about doing this?
in Java, you would parse the XML into a org.w3c.dom.Document object. Construct javax.xml.transform.dom.DOMSource object on this. Construct a javax.xml.transform.Transformer for the XSLT file. Call the transform method on the transformer object, passing the DOMSource to it. The output will be the transformed result.

Generate a xml from a html

Im trying to generate a xml from a html (url). The html website have a formulary that i want to get into a xml archive, but its too long and im searching a way to do it easier.
There is a method to generate a xml with all the fields, etc, from a html?
you can also use an html parser and print out the objects / array as xml
try this: http://sourceforge.net/projects/html2xml/
You can try the free dotnet-classlibrary SgmlReader that can load html into a xmldocument. This in turn can be saved as xml.

linqtoxml - insert string literal into xml file

I am using LINQ-to-XML. I am building a small program that helps parse HTML. I'd like to save the HTML tags into an XML file, but I don't want the XML file to check the validity of the entered HTML elements.
How can I just entere a simple string literal (a pretty long one)?
Maybe using a CDATA construct could help you out, see w3schools.com

Embedding an external XML file in HTML and using an XSL file to parse it?

I have an XML file ("exchange.xml") and an XSL file ("exchange.xsl") for parsing it. exchange.xml has the following line:
<?xml-stylesheet type="text/xsl" href="exchange.xsl"?>
Indicating that exchange.xsl should be used to parse the file.
It parses as expected if I load exchange.xml in a browser. However, I want to have exchange.xml embedded in an HTML file which otherwise does not have any XML in it. How do I do this? As far as I understand it, when you embed XML in HTML with the XML tag's SRC ID, you then have to parse it yourself for it to be displayed. I want it to automatically be displayed using the XSL file I've already created for it.
It has also come to my attention that some browsers (e.g. Android default browser) won't parse the XML with the XSL file, and require a server-side transform. Is it possible to embed my XML into an HTML file but use a server-side transform?
As far as I am aware HTML 4 as specified by the W3C does not provide any way to embed XML markup in an HTML 4 document. And of the browsers I know only IE has an extension to HTML 4 that allows that, the so called XML data islands, http://msdn.microsoft.com/en-us/library/ms766512%28v=VS.85%29.aspx, where IE's HTML parser recognizes an xml element that can contain XML markup as its content or link to it with the src attribute. So unless you want to use something IE specific like an XML data islands the cross-browser way to load XML as data inside of an HTML document is to use XMLHttpRequest with Javascript.
You can use PHP: XSL class for transforming XML and include it via PHP in your page.
eg: showxml.php
<?php
function processXML($file,$styles){
... //example here
}
<html>
...
<div id="xml>
<?php echo processXML('foo.xml','bar.xsl'); ?>
</div>
...
</html>

how to convert the webpage to xml document using java?

The assumption is the webpage is coded with correct tags. How can I Convert it to the XML file? I think the most webpages can be viewed as dom tree...How can I convert it to XML file?
JTidy reads HTML and presents it as a DOM. Once you have your HTML as a DOM you should be able to process it and write it out as XML.
To output a DOM, see the example code here and the XMLSerializer in particular.