What is DOM? (summary and importance) - html

What is the Document Object Model (DOM)?
I am asking this question because I primarily worked in .NET and I only have limited experience, but I often hear more experienced developers talk about/mention it. I read tutorials online but I am unable to make sense of the whole picture. I know that it is an API!
More specific questions are:
Where is it currently used?
What field(s) of developers use it (ex-.NET developers)?
How relevant is it for all developers in general to understand?

In general terms a DOM is a model for a structured document.
It is a central concept in today's IT and no developer can opt out of DOM. Be it in .net, in HTML, in XML or other domains where it is used.
It applies to all documents (word documents, HTML pages, XML files, etc). In the developer sphere it applies mainly in the HTML and the XML domains with slightly different meanings.
HTML
In the HTML arena, the DOM was introduced to support the revolution called in the late 90ies "dynamic HTML". Before IE4 and Netscape 4.0, HTML documents where not changeable inside the browser (all you had in these remote times to sprite up a web page was "animated GIF" !!!! and HTML was version 3.2).
Therefore dynamically manipulating inside the browser the document sent by the server was a huge revolution and initiated the march towards the attractive web sites we see today.
Javascript had been introduced by Netscape (baptised javascript to surf on the new Java trend, but unrelated) and was supported by both Netscape HTTP servers and Netscape browsers, with Internet Explorer eagerly following the move inside the browser. However When javascript is used to manipulate the content of a document, you need an easy way to designate the part of the document you want to interact with. That's where the DOM comes in. Although HTML 4 is not "well formed", browsers build an internal representation of the page with the "body" element at its top and plenty of html tags below, in a hierarchical organisation (child nodes, parent nodes attributes etc). The DOM is the model underpinning the API that allows to navigate this hierarchy.
Since both Netscape and IE browsers were competing solutions, there was little chance the NS and the IE DOM would converge. The W3C stepped in to allow smaller browser vendors to enter the competition and endeavoured to standardised the DOM. Hence the W3C DOM. All it did was just to introduce another dialect and as everybody knows it took years and two serious competitors to force MS to comply with the standards.
Even though more moderns navigating techniques like JQuery have shorthand notations for the DOM, they internally rely on the DOM.
XML
HTML made obvious the disadvantages of showing leniency towards the "well-formedness" of documents and this ushered a new craze : XML. In the web arena, XML and XSLT were first supported by IE5 and adopted in many more domains than just displaying pages.
To parse XML, in the Java Word mainly, you would develop a SAX parser which is basically a plugin to a SAX engine in which you describe what the engine should do of all the XML events (tags...) it will encounter in the parsed document. Developing a SAX parser is not straightforward but is a low footprint solution.
However you have to develop a specific one for each new document type...
So it was not long before libraries started to appear to parse any document and build an in-memory map of its hierarchy. Because it also had the same concepts of root, parents and children (inherited from SGML through HTML), it was also termed a DOM and the name applies regardless of the library.
Other domains
The concept of DOM is not restricted to or even invented for HTML or XML. A DOM is a general concept applicable to any document, especially those (the vast majority of them do) showing a hierarchical structure in which you need to navigate. You can speak about the DOM of a MS-Word document and there are APIs to navigate these as well.

The DOM is the application programming interface for well-defined HTML and XML structures (per W3C's document). It is used in any place where you interact with the elements of a web page (any element - style, text, attributes, etc). You will hear a lot about the DOM with JavaScript and/or JavaScript libraries, such as jQuery (which, of course, is JavaScript). It is also referenced with Java, ECMAScript, JScript, and VBScript.
If you are programming .NET it is important if you are doing web-based work. If you are doing application programming, it's not as important. The DOM is definitely not a thing of the past - it is used and worked within every day by many developers. With that said, there has been work towards standardization of the DOM across web browsers. (Again, libraries can help hide these differences. This is one reason jQuery is so popular. You don't have to worry about the browser specifics - you just do what you need to do.)
The document I linked to above does a great job of answering all your questions and more. I would highly recommend reading it. If you have more questions, you can also check out the links below:
What is the Document Object Model? (W3C)
Document Object Model (Wikipedia)

I'm really not going to be able to explain it any better than the Wikipedia Article on DOM
But to answer a few of your questions:
Where do we still use it?
Every web browser since the mid-nineties.
Who uses it,
Every web developer since the mid-nineties.
in what technology?
Mostly the web via JavaScript, but pretty much anytime you access XML/HTML programatically you are using some kind of DOM implementation.
How important is it for anyone in .net
carrier? [sic]
Extremely, although you probably use it without even knowing it.
Is this just a thing of the past which
was heavily used but had problems?
If it is then somebody needs to tell John Resig that he has wasted the past 3 years of his life.

When a browser loads an HTML page, its convert it to the Document Object Model (DOM).
The browser's produced HTML DOM, constructs as a tree that consists of all your HTML page element as objects. For example, assuming that you load below HTML page on a browser:
<!DOCTYPE html>
<html>
<head>
<title>website title</title>
</head>
<body>
<p id="js_paragraphId">I'm a paragraph</p>
some website
</body>
</html>
After loading, the browser converts it to:
Some of the ability of scripting languages on HTML DOM consists of:
1- Change all the HTML elements in the page.
2- change all the HTML attributes in the page.
3- Change all the CSS styles on the page.
4- Remove existing HTML elements and attributes.
5- Add new HTML elements and attributes.
6- React to all existing HTML events in the page.
7- create new HTML events on the page.
Let's back to your questions:
1- It currently used in all modern browsers.
2- Front-end developers.
3- All Front-end developers that using scripting languages especially JavaScript.

What is DOM?
The Document Object Model (DOM) is a programming API for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated. A standard defined by w3 consortium.
Source: http://www.w3.org/TR/WD-DOM/introduction.html

Related

Why Shadow DOM when we have iframes?

I heard of shadow DOM which seems to solve the problem of encapsulation in web widget development. DOM and CSS rules get encapsulated which is good for maintenance. But then isn't this what iframes are for? What problems are there with iframes that made it necessary for W3C to come up with Shadow DOM or HTML5 Web Components?
Today, iframes are commonly used to assure separate scope and styling. Examples include Google's map and YouTube videos.
However, iframes are designed to embed another full document within the current HTML document. This means accessing values in a given DOM element in an iframe from the parent document is a hassle by design. The DOM elements are in a completely separate context, so you need to traverse the iframe’s DOM to access the values you’re looking for. Contrast this with web components which offer an elegant way to expose a clean API for accessing the values of custom elements.
Imagine creating a page using a set of 5 iframes that each contain one component. Each component would need a separate URL to host the iframe’s content. The resulting markup would be littered with iframe tags, yielding markup with low semantic meaning that is also clunky to read and manage. In contrast, web components support declaring rich semantic tags for each component. These tags operate as first class citizens in HTML. This aids the reader (in other words, the maintenance developer).
In summary, while both iframes and the shadow DOM provide encapsulation, only the shadow DOM was designed for use with web components and thus avoids the excessive separation, setup overhead, and clunky markup that occurs with iframes.
iframes are use as just encapsulation objects...
with the exception of SVG (more on that later), today’s Web platform
offers only one built-in mechanism to isolate one chunk of code from
another — and it ain’t pretty. Yup, I am talking about iframes. For
most encapsulation needs, frames are too heavy and restrictive.
Shadow DOM allows you to provide better and easier encapsulation, by creating another clone of the DOM or part of it.
For example imagine you build a widget (as I have) that is used across websites.
You widget might be affected by the css on the page and look horrible, whereas with Shadow DOM it will not :)
Here is an excellent article on the topic:
What The Heck is Shadow DOM/

Is a browser obliged to use a DOM to render an HTML page?

I was reading the page about the Document Object Model on Wikipedia.
One sentence caught my interest; it says:
A Web browser is not obliged to use DOM in order to render an HTML
document.
You can find the entire context on the page right here.
I don't understand that is there any other alternative to render an HTML document? What exactly does this sentence mean?
Strictly speaking IE (at least < IE9) does not use a DOM to render an HTML document. It uses its own internal object model (which is not always a pure tree structure).
The DOM is an API, and IE maps the API methods and properties onto actions on its internal model. Since the DOM assumes a tree structure, the mapping is not always perfect, which accounts for a number of oddities when accessing the document via the DOM in IE.
The primary job of a browser is to display HTML. Most browsers use a DOM; they parse the HTML, create a DOM structure from it (which can also be used in JavaScript) and render the page based on that DOM.
But if a browser chooses not to, it is free to do so. I wouldn't know why, and I certainly don't understand why this line is explicitly mentioned in the Wiki article..

Yet another question regarding the html5 dtd/schema

If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation? And by document validation, I mean "how are we supposed to ensure our html5 documents are both syntactically accurate and structurally sound?" Please help! This is going to become a huge problem for our industry if we have no way to accurately validate HTML5 documents!
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax? I mean, it seems rather rudimentary to just look at the document and say "yep. that's a valid xml document." What about custom tags? What about tag attributes? It seems like the W3C is leaving us out in the cold a little bit here.
Maybe the best answer will be found in the HTML editor. But then you get DTD/schema fragmentation. Each editor vendor coming up with their own rendition of what a valid structure is.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
So, any thoughts?
If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation?
With a specialized HTML5 validator rather then a generic SGML or XML validator.
Obviously, as the specification is still in draft form, the tools that do exist are immature and likely to be out of date or become out of date.
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax?
Either use a different tool or download the W3C validator and run a local copy. It has a SOAP API so writing a batch validation tool isn't difficult.
What about custom tags?
HTML5 doesn't allow custom elements.
What about tag attributes?
The only custom attributes in HTML5 are data-* attributes, so an HTML 5 validator can recognize them.
It seems like the W3C is leaving us out in the cold a little bit here.
It seems like you expect the state of QA tools for HTML 5 (unfinished) to be up to the same standard as those for HTML 4 (over a decade old). This isn't a realistic expectation.
Maybe the best answer will be found in the HTML editor. But then you get DTD/schema fragmentation. Each editor vendor coming up with their own rendition of what a valid structure is.
The specification is clear (although in flux) even if it isn't expressed in the form of a DTD or schema. If each editor has a different idea of what is valid, then most or all of them are going to be either out of date or just buggy.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
If you need to live in the bleeding edge, then you have to accept the limitations and risks of doing so.
You might find this question/answer interesting: Will HTML 5 validation be worth the candle? . The answer is written by the developer of http://about.validator.nu/ .
You should start by taking a look at http://about.validator.nu/ .
Some, though not all, of your concerns are addressed there. You can host your own validator, there's a python based submission script, you can use a RESTFUL web service API and there are ways to get validation output in a variety of different forms.
I can't however see a simple way to integrate XHTML5 with other applications of XML such that one can easily create a validator of such compound documents. Not that there's really been a way to do that with earlier versions of XHTML either though.
This is working well for me: https://github.com/hober/html5-el
To get this to work, I renamed the default '/etc/schema/schemas.xml' file in order to move it out of the way and let the 'html5-el' one be used by nxml-mode.
If there is no DTD or schema to validate the H5 document against, how are we supposed to do document validation? And by document validation, I mean "how are we supposed to ensure our html5 documents are both syntactically accurate and structurally sound?" Please help! This is going to become a huge problem for our industry if we have no way to accurately validate HTML5 documents!
If testing pages with either Firefox or Opera, both of those will report errors such as code that is not "well-formed" and mismatched tags. Beyond that, one of the validators such as validator.w3.org or validator.nu will definitely help.
Sure, the W3C has an online tool that validates individual pages. But, if I'm creating A LOT of pages (hundreds, say) and I want to validate them in a sort of batch mode, what is the accepted method of ensuring valid structure and syntax? I mean, it seems rather rudimentary to just look at the document and say "yep. that's a valid xml document."
There are ways to run the W3C validator in batch mode.
What about custom tags? What about tag attributes? It seems like the W3C is leaving us out in the cold a little bit here.
The easy answer to that one is that "custom tags" are simply not considered valid. The Working Group has thoroughly addressed the issue of "distributed extensibility", particularly with respect to allowing "decentralized
parties to create their own languages" and "extension attributes" (http:// lists.w3.org/Archives/Public/public-html/2011Feb/0085.html). There are numerous ways to extend HTML (http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#extensibility) but adding custom tags is not one of them. Custom data and microdata attributes should validate fine.
Maybe the answer is "wait for HTML5 to become official". But I really can't wait for that. I need to start creating and validating content now. I have applications I want to publish that can only be accomplished with html5.
Since HTML 5 was stabilized at the end of last year (Dec. 2010), IMO we don't need to wait for it to become an official "recommendation" by the W3C. The stabilized spec provides a solid base that all browser vendors can implement consistently and for the ongoing evolution beyond HTML 5 of the spec, which is now being called the "HTML Living Standard" (Jan. 2011 and later). There is a good diagram of this at http://www.HTML-5.com/html-versions-and-history.html#html-versions (scroll down to see the diagram).

What's the difference between the Browser Object Model and the Document Object Model?

What is the difference between the two?
The Browser Object Model is a larger representation of everything provided by the browser including the current document, location, history, frames, and any other functionality the browser may expose to JavaScript. The Browser Object Model is not standardized and can change based on different browsers.
The Document Object Model is standardized and is specific to current HTML document. It is exposed by the Browser Object Model (i.e., DOM is a subset of BOM).
BOM
This is an informal term as there is no W3C or WHATWG standard that mentions it.
One simple definition would be that the BOM encompasses the entire object structure which is accessible via scripting in the browser, beginning with the window object which "contains everything else", since it's the global object.
The window object contains lots of properties (try console.dir( window );). These properties are specified in numerous web-standards. The "core" specification of the window object is as of now still specified in the HTML standard - see here, but I guess it's only a matter of time until the editors decide to transfer this specification into a separate standard. I'm definitively rooting for a "BOM" or "Browser Environment" standard to make things more logical and appropriate.
DOM
This on the other hand is a formal term. You can find definitions of this term in various standards, for instance the DOM4 standard states:
The DOM is a language- and platform neutral interface that allows
programs and scripts to dynamically access and update the content and
structure of documents.
Notice how the emphasis is on documents. Unlike the BOM which is basically and umbrella term for all the APIs in the browsers, DOM are only those APIs which deal with documents.
A simple definition would be that the DOM is implemented as the document object (which is the root of the DOM tree btw). Basically, the DOM tree (and everything inside it) can be considered part of the DOM. Analogously, everything beyond the DOM-tree is not part of the DOM.
beyond the DOM-tree == all the properties of window except the document object
"Browser Object Model" (BOM) is a term from the early 2000s that didn't catch on and was replaced[1] with the term "Web APIs"
Web APIs are the JavaScript APIs available to web pages: any objects/interfaces, their properties, methods, and events the browser makes available to the page, except for the objects, like String, that are part of JavaScript language itself.
The DOM (Document Object Model), in context of web development, is a subset of Web APIs concerned with manipulation of the structure and contents of web pages and other "documents".
Historically, the DOM was designed as "a platform- and language-neutral interface" with DOM Level 1 specification describing both the ECMAScript (JavaScript) and Java bindings in appendices. You might still use DOM APIs to work with XML/HTML data from outside the browser (e.g. using Xerces in Java), but the "Living Standard" version of the DOM specification is maintained with the focus on the web use-case, and the most recent W3C implementation report includes mainly (if not only) web browsers.
[1] See Google trends for "Browser object model", and how in a modern book (JavaScript Cookbook: Programming the Web) it's only briefly mentioned as 'BOM - see Web API'.
[answer rewritten in 2019]
i think BOM=DOM +(-) non standard properties of the browser. so every browser has its own BOM
BOM is wider view of entire browser where as DOM is ristricted to document window and associated methods. view complete article
javascript-browser-object-model

What's the key difference between HTML 4 and HTML 5?

What are the key differences between HTML4 and HTML5 draft?
Please keep the answers related to changed syntax and added/removed html elements.
HTML5 has several goals which differentiate it from HTML4.
Consistency in Handling Malformed Documents
The primary one is consistent, defined error handling. As you know, HTML purposely supports 'tag soup', or the ability to write malformed code and have it corrected into a valid document. The problem is that the rules for doing this aren't written down anywhere. When a new browser vendor wants to enter the market, they just have to test malformed documents in various browsers (especially IE) and reverse-engineer their error handling. If they don't, then many pages won't display correctly (estimates place roughly 90% of pages on the net as being at least somewhat malformed).
So, HTML5 is attempting to discover and codify this error handling, so that browser developers can all standardize and greatly reduce the time and money required to display things consistently. As well, long in the future after HTML has died as a document format, historians may still want to read our documents, and having a completely defined parsing algorithm will greatly aid this.
Better Web Application Features
The secondary goal of HTML5 is to develop the ability of the browser to be an application platform, via HTML, CSS, and Javascript. Many elements have been added directly to the language that are currently (in HTML4) Flash or JS-based hacks, such as <canvas>, <video>, and <audio>. Useful things such as Local Storage (a js-accessible browser-built-in key-value database, for storing information beyond what cookies can hold), new input types such as date for which the browser can expose easy user interface (so that we don't have to use our js-based calendar date-pickers), and browser-supported form validation will make developing web applications much simpler for the developers, and make them much faster for the users (since many things will be supported natively, rather than hacked in via javascript).
Improved Element Semantics
There are many other smaller efforts taking place in HTML5, such as better-defined semantic roles for existing elements (<strong> and <em> now actually mean something different, and even <b> and <i> have vague semantics that should work well when parsing legacy documents) and adding new elements with useful semantics - <article>, <section>, <header>, <aside>, and <nav> should replace the majority of <div>s used on a web page, making your pages a bit more semantic, but more importantly, easier to read. No more painful scanning to see just what that random </div> is closing - instead you'll have an obvious </header>, or </article>, making the structure of your document much more intuitive.
From Wikipedia:
New parsing rules oriented towards flexible parsing and compatibility
New elements – section, video, progress, nav, meter, time, aside, canvas
New input attributes – dates and times, email, url
New attributes – ping, charset, async
Global attributes (that can be applied for every element) – id, tabindex, repeat
Deprecated elements dropped – center, font, strike
HTML5 introduces a number of APIs that help in creating Web applications. These can be used together with the new elements introduced for applications:
An API for playing of video and audio which can be used with the new video and audio elements.
An API that enables offline Web applications.
An API that allows a Web application to register itself for certain protocols or media types.
An editing API in combination with a new global contenteditable attribute.
A drag & drop API in combination with a draggable attribute.
An API that exposes the history and allows pages to add to it to prevent breaking the back button.
You'll want to check HTML5 Differences from HTML4: W3C Working Group Note 9 December 2014 for the complete differences. There are many new elements and element attributes. Some elements were removed and others have different semantic value than before.
There are also APIs defined, such as the use of canvas, to help build the next generation of web apps and make sure implementations are standardized.
You might be interested in this list of HTML5 elements and attributes.
Also, please note that it's "HTML 4", not "HTML4". Indeed, for HTML 5, both variants are used, but there is an important difference in meaning. HTML 5 refers to the name of the W3C specification, whereas "HTML5" is the document type of those HTML files with a text/html MIME type that follow this spec.
The same goes for XHTML 5 vs. XHTML5.
Now W3c provides an official difference on their site:
http://www.w3.org/TR/html5-diff/
HTML 5 invites you give add a lot of semantic value to your code. What's more, there are natives solution to embed multimedia content.
The rest is important, but it's more technical sugar that will save you from doing the same stuff with a client programming language.
In short it is much simple compared to html, the long doctype is removed and also center and font tag is removed.
I also answered this difference in my blog :
http://ravisinghblog.in/key-difference-between-html-and-html-5/