HTML - xml data islands

HTML - xml data islands - html

I am designing a web app and I intent to embed data on an xml island so that I can dynamically render it on an HTML table on the client-side based on options the users will select.
I have the broad concepts, but I need pointers on how to use DOM in navigating my xml. And how to update my xml island possibly for posting back to the server?
Please any links to online resources or a quick advice will be very appreciated.
NB: I understand most of the dynamic HTML concepts and server and client side stuff, so don't shy being very technical in your response:)

In W3C HTML there are no XML data islands (unless you're referring to external XML file linked via frames loaded using Javascript), but you can re-use HTML elements and insert metadata in class, title (if you care about HTML4 validity), data-* (HTML5) or your custom attributes.
For DOM navigation you've got DOM Core, like element.childNodes, .nextSibling, .getAttribute(), etc.
DOM can be verbose and tedious to use (e.g. when looking for elements in DOM you have to be careful to skip text nodes), so there are JS libraries like jQuery and Prototype built on top of it that offer more convenient API.
If you intend to a lot of DOM transformations, then Javascript API for XPath and XSLT processor will be handy.

What you describe can be done with XML.
However, I think it would be much easier if you used JSON instead of XML. That way, you can directly work with a Javascript object, which is friendlier than navigating XML DOM. Then you can send the serialized JSON form to the server using the JSON library

Ajax Patterns has some good examples for using data islands: http://ajaxpatterns.org/wiki/index.php?title=XML_Data_Island

Related

HTML alternatives to make website?

I am using HTML to make a websites. I know an alternative languages to markup a website: XHTML, WML. Is there any more markup languages? Can I make a website only with XML or SGML?
Thank you for responses.

There aren't many.
To have a browser make your website viewable, you must provide your website in some language your browser can understand. Just plaintext and HTML work universally, there is quite some SVG and PDF support these days, but the only language you can use that can do all the things most people want a website to be able to do, you will have to use HTML or XHTML in some way. Either through JS or by some templating system, but you'll have to use it to generate what's commonly accepted as being a webpage as far as I'm aware.
That being said, there are some languages like Haml, which can be 'compiled' to HTML so you could use that instead. There are also converters for other XML-based languages and such.

If you want to deliver XML from your server, you can convert it to HTML in the browser using either
(a) XSLT 1.0: nearly all browsers have built in support for XSLT 1.0, which can be invoked using the xml-stylesheet processing instruction embedded in the XML
(b) XSLT 3.0: supported using Saxon-JS, which can be invoked using a small Javascript call in a skeletal HTML page.
(c) CSS: if the XML is reasonably close in structure to what you want to present to the user, you can attach styling properties to the XML elements using CSS.
You can also of course maintain your content in XML and convert it to rendered HTML using XSLT either at publishing time, or when each page is requested using code on the server to do the conversion on demand.

Many scientific journals use XML (actually JATS XML) for publishing scientific articles. XML in this case is transforming to HTML either on server or on client side by javascript. As example you can look here, where taking place client-side transformation. But Google will not index such XML.

What is the difference between html and xml?

Blogger blogs use XML, and Tumblr blogs use HTML, but they both do the same thing. What is the difference between XML and HTML and what are the pros and cons of using them?

HTML is a markup language for describing hypertext ("with links") documents.
XML is a generic markup language designed for using as a base for building custom markup languages.
They are different tools for different jobs.
When you write a template for either Blogger or Tumblr, you are using a template language. Blogger's template language happens to be built in XML. The template language is combined with your data to generate the HTML documents that are sent to the browser.

XML was developed to describe data and to focalize on what the data represent.
HTML was developed to display data about to focalize on the way that data looks.
HTML is about displaying data, XML is about describing information.
Reference

In an ideal world, HTML would be just one of many XML vocabularies. The only reason it isn't is history: (a) it was wild on the web before XML was formulated, and (b) it developed a culture where you could write any old garbage and the browsers would try and make sense of it, whereas XML insisted that you stick to the rules.
The W3C spent about ten years trying to make this vision happen, through the medium of XHTML, but in the end the browser vendors decided to go their own way.

XML was developed to describe data and to focalize on what the data represents.
Whereas HTML describes the structure of Web pages using markup.
In other words, HTML is used to display data, XML is used to store and transport data.

Datapower - To parse HTML

I have a situation where the underlying application provides a UI layer and this in turn has to be rendered as a portlet. However, I do not want all parts of the UI originally presented to be rendered in Portlet.
Proposed solution: Using Datapower for parsing an XML being a norm, I am wondering if it is possible to parse a HTML. I understand HTML may not be always well formed. But if there are very few HTML pages in underlying application, then a contract can be enforced..
Also, if one manages to parse and extract data out of HTML using DP, then the resultant (perhaps and XML) can be used to produce HTML5 with all its goodies.
So question: Is it advisable to use Datapower to parse an HTML page to extract an XML out of it? Prerequisite: number of HTML pages per application could vary in data but not with many pages.

I suspect you will be unable to parse HTML using DataPower. DataPower can parse well-formed XML, but HTML - unless it is explicitly designed as xHTML - is likely to be full of tags that break well-formedness.
Many web pages are full of tags like <br> or <ul><li>Item1<li>Item2<li>Item3</ul>, all of which will cause the parsing to fail.
If you really want to follow your suggested approach, you'll probably need to do something on a more flexible platform such as WAS where you can build (or reuse) a parser that takes care of all of that for you.
If you think about it, this is what your web browser does - it has all the complex rules that turn badly-formed XML tags (i.e. HTML) into a valid DOM structure. It sounds like you may be better off doing manipulation at the level of the DOM rather than the HTML, as that way you can leverage existing, well-tested parsing solutions and focus on the structure of the data. You could do this client-side using JavaScript or you could look at a server-side JavaScript option such as Rhino or PhantomJS.
All of this might be doing things the hard way, though. Have you confirmed whether or not the underlying application has any APIs or web services that IT uses to render the pages, allowing you to get to the data without the existing presentation layer getting in the way?
Cheers,
Chris

Question of parsing and HTML page originates when you want to do some processing over it. If this is the case you can face problems because datapower by default will not allow hyperlinks inside the well formed XML or HTML document [It is considered to be a security risk], however this can be overcome with appropriate settings in XML manager present.
As far as question of HTML page parsing is concerned, Datapower being and ESB layer is expected to provide message format translation and that it indeed does. So design wise it is a good place to do message format translation. Practically however you will face above mentioned problem when you try to parse HTML as XML document.
The parsing can produce any message format model you wish [theoretically] hence you can use the XSLT to achieve what you wish.
Ajitabh

What is DOM? (summary and importance)

What is the Document Object Model (DOM)?
I am asking this question because I primarily worked in .NET and I only have limited experience, but I often hear more experienced developers talk about/mention it. I read tutorials online but I am unable to make sense of the whole picture. I know that it is an API!
More specific questions are:
Where is it currently used?
What field(s) of developers use it (ex-.NET developers)?
How relevant is it for all developers in general to understand?

In general terms a DOM is a model for a structured document.
It is a central concept in today's IT and no developer can opt out of DOM. Be it in .net, in HTML, in XML or other domains where it is used.
It applies to all documents (word documents, HTML pages, XML files, etc). In the developer sphere it applies mainly in the HTML and the XML domains with slightly different meanings.
HTML
In the HTML arena, the DOM was introduced to support the revolution called in the late 90ies "dynamic HTML". Before IE4 and Netscape 4.0, HTML documents where not changeable inside the browser (all you had in these remote times to sprite up a web page was "animated GIF" !!!! and HTML was version 3.2).
Therefore dynamically manipulating inside the browser the document sent by the server was a huge revolution and initiated the march towards the attractive web sites we see today.
Javascript had been introduced by Netscape (baptised javascript to surf on the new Java trend, but unrelated) and was supported by both Netscape HTTP servers and Netscape browsers, with Internet Explorer eagerly following the move inside the browser. However When javascript is used to manipulate the content of a document, you need an easy way to designate the part of the document you want to interact with. That's where the DOM comes in. Although HTML 4 is not "well formed", browsers build an internal representation of the page with the "body" element at its top and plenty of html tags below, in a hierarchical organisation (child nodes, parent nodes attributes etc). The DOM is the model underpinning the API that allows to navigate this hierarchy.
Since both Netscape and IE browsers were competing solutions, there was little chance the NS and the IE DOM would converge. The W3C stepped in to allow smaller browser vendors to enter the competition and endeavoured to standardised the DOM. Hence the W3C DOM. All it did was just to introduce another dialect and as everybody knows it took years and two serious competitors to force MS to comply with the standards.
Even though more moderns navigating techniques like JQuery have shorthand notations for the DOM, they internally rely on the DOM.
XML
HTML made obvious the disadvantages of showing leniency towards the "well-formedness" of documents and this ushered a new craze : XML. In the web arena, XML and XSLT were first supported by IE5 and adopted in many more domains than just displaying pages.
To parse XML, in the Java Word mainly, you would develop a SAX parser which is basically a plugin to a SAX engine in which you describe what the engine should do of all the XML events (tags...) it will encounter in the parsed document. Developing a SAX parser is not straightforward but is a low footprint solution.
However you have to develop a specific one for each new document type...
So it was not long before libraries started to appear to parse any document and build an in-memory map of its hierarchy. Because it also had the same concepts of root, parents and children (inherited from SGML through HTML), it was also termed a DOM and the name applies regardless of the library.
Other domains
The concept of DOM is not restricted to or even invented for HTML or XML. A DOM is a general concept applicable to any document, especially those (the vast majority of them do) showing a hierarchical structure in which you need to navigate. You can speak about the DOM of a MS-Word document and there are APIs to navigate these as well.

The DOM is the application programming interface for well-defined HTML and XML structures (per W3C's document). It is used in any place where you interact with the elements of a web page (any element - style, text, attributes, etc). You will hear a lot about the DOM with JavaScript and/or JavaScript libraries, such as jQuery (which, of course, is JavaScript). It is also referenced with Java, ECMAScript, JScript, and VBScript.
If you are programming .NET it is important if you are doing web-based work. If you are doing application programming, it's not as important. The DOM is definitely not a thing of the past - it is used and worked within every day by many developers. With that said, there has been work towards standardization of the DOM across web browsers. (Again, libraries can help hide these differences. This is one reason jQuery is so popular. You don't have to worry about the browser specifics - you just do what you need to do.)
The document I linked to above does a great job of answering all your questions and more. I would highly recommend reading it. If you have more questions, you can also check out the links below:
What is the Document Object Model? (W3C)
Document Object Model (Wikipedia)

I'm really not going to be able to explain it any better than the Wikipedia Article on DOM
But to answer a few of your questions:
Where do we still use it?
Every web browser since the mid-nineties.
Who uses it,
Every web developer since the mid-nineties.
in what technology?
Mostly the web via JavaScript, but pretty much anytime you access XML/HTML programatically you are using some kind of DOM implementation.
How important is it for anyone in .net
carrier? [sic]
Extremely, although you probably use it without even knowing it.
Is this just a thing of the past which
was heavily used but had problems?
If it is then somebody needs to tell John Resig that he has wasted the past 3 years of his life.

When a browser loads an HTML page, its convert it to the Document Object Model (DOM).
The browser's produced HTML DOM, constructs as a tree that consists of all your HTML page element as objects. For example, assuming that you load below HTML page on a browser:
<!DOCTYPE html>
<html>
<head>
<title>website title</title>
</head>
<body>
<p id="js_paragraphId">I'm a paragraph</p>
some website
</body>
</html>
After loading, the browser converts it to:
Some of the ability of scripting languages on HTML DOM consists of:
1- Change all the HTML elements in the page.
2- change all the HTML attributes in the page.
3- Change all the CSS styles on the page.
4- Remove existing HTML elements and attributes.
5- Add new HTML elements and attributes.
6- React to all existing HTML events in the page.
7- create new HTML events on the page.
Let's back to your questions:
1- It currently used in all modern browsers.
2- Front-end developers.
3- All Front-end developers that using scripting languages especially JavaScript.

What is DOM?
The Document Object Model (DOM) is a programming API for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated. A standard defined by w3 consortium.
Source: http://www.w3.org/TR/WD-DOM/introduction.html

What are the advantages of creating web pages with XML instead of HTML?

From time to time, I see web pages whose content is solely written in XML (not HTML or XHTML). These pages usually have some style sheets (either XSLT or CSS) attached to them which makes them look like any other ordinary web page.
My question is, what are the advantages of such an approach (if any), and why would anyone choose to work this way?
EDIT: If this is a good thing, why is it not widespread?
EDIT 2: Thanks everyone for the great responses. They really enlightened me. I also found this question whose content is also related.

It's easier to generate it programmatically and reuse it for other purposes than displaying as webpage.
Update:
EDIT: If this is a good thing, why is it not widespread?
Not everyone needs to generate it programmatically or reuse it for other purposes than displaying as webpage. It's then easier to use plain HTML.

One possible advantage would be for use of the data of the page in something other than a web browser; that would (presumably) be easier to do if a page's content were well-formed XML. Of course in theory a well-formed, semantic XHTML page should be nearly as able to be parsed, as well.
It can also be easier to generate XML instead of XHTML, depending on the data source.

When you are getting XML data in to your system, and you are supposed to present this XML data then it is much easier to write some XSLT for that XML instead of parsing it using some sort of parser and then presenting the data.
That can be a valid point for using XML instead of XHTML or HTML
Update
To answer your question on why this is not widespread, is because XSTL is tedious and hard to work with. Specifically XPath, which can be for some people quite difficult to use.

Those pages use XSLT to get rendered on the client side. Not every browser (especially older ones) supports rendering XML + XSLT. XML can however be used server-side as template and get transformed to HTML by the application running on the server. I personally don't see any advantages to this approach.

There are a lot more web pages that are written solely in XML than you know. You're only seeing the ones that do the XSLT transformation on the client side. Server-side transformation of XML is not at all unusual, because there's a plethora of things that produce data in XML, and transforming XML to HTML in XSLT is straightforward. You'll never know this is happening if you just look at the HTML, which bears no signs of having been generated via XSLT.

Personally, I don't understand it either though one of the biggest problems is support in IE. I created a skeleton ecommerce site serving XML, transformed by XSLT and styled using CSS. I sorely missed the ability to use XLink and other wonderful XML features. It's also nice to be able to tag the data for what it is. I used a 'menu' tag for the restaurant menus. 'price' tags for prices and so on. If a user clicked on a link to change menus, all I had to do was send the name of the item, the price and the description instead of the complete page. iirc, a 4K or more HTML menu page was only 200 bytes of sent data.
As far as the "one error makes everything crash in XML" type comments, the same is true of any programming language so proper coding should be no bother for programmers and careful HTML/CSS types.
Before anyone says that what I did was actually XHTML...no. I served XML. I did call up XHTML namespaces when needed for links, images and HTML type things but only when necessary.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008