In which case DOM, can generate 2 trees?
I had this question on a test and I said this happens when we have 2 htmls in the same Web page.
Is this true?
There are a number of ways to do that, depending on how you define "tree".
You can have an <iframe> in your document, but that tree will have its own window, and will not be directly connected to your original tree.
You can have an <html> element inside your HTML (which is invalid HTML, but will still work), but that will actually be a subtree
You can use DOM APIs to build a detached <html> element
Simply instantiating a separate Document object, e.g. through DOMParser, an XHR with .responseType = "document" or with the DOMImplementation.createDocument factory method would create independent DOM trees.
Related
Elm does not appear to support HTML document's head node, <head>..</head>.
The question is why not support the complete HTML document with suitable functions. It would seem such an inclusion would allow expedient use of external resources such as style sheets.
Apart from DOCTYPE, HTML tags are uniformly, tagName attrList elmList. Perhaps a set of appendAttr and appendElm functions could be concocted to allow flexibility for specifying a more comprehensive VirtualDom.
Am I missing something?
By the time that your Elm code has loaded and starts running, the browser has already read in the <head> of the HTML page that contains the Elm code, so it's too late to influence the contents of the <head>.
Elm can be embedded into an element in the page, or run full-screen (which appears to add a child to the <body>). Elm can only manipulate content within its container, not outside of it. In particular, all elements that Elm generates will be contained within the <body> of the document, whereas <head> is a sibling of <body>.
It's possible to generate HTML elements with any name you like, using Html.node "elementName". So it's possible to create a <head> element in Elm. However, a <head> element created this way would end up inside <body>, and I would expect browsers to ignore it.
Luke's answer is of course perfectly correct, but nothing prevents you updating the head of a document using javascript via a port. Here is an example to update the title (the tab name).
In Elm
port module Ports exposing (..)
port title : String -> Cmd msg
with this sort of update function
update message model =
case message of
SetTitle name ->
(model, Ports.title name)
In Javascript
var elm = Main.fullscreen();
elm.ports.title.subscribe( title => {
document.title = title;
});
Elm is a (mostly) purely functional language, meaning it attempts to minimize all side effects that could lead to errors. As the elm compiler has no way to know if a CSS file will exist at runtime, it cannot safely say whether or not including a CSS file will result in an error. Therefore, it is not something that will likely be included in the future.
That being said, Rtfeldman of NoRedInk has created the repo https://github.com/rtfeldman/elm-css that allows for elm code to mirror CSS, ensuring safety as it goes.
To make your body have a background color with a certain color, this elm code suffices:
[ body
[ backgroundColor (rgb 200 128 64)
, color (hex "CCFFFF")
]
]
and will compile to a .css file for you.
I recently tried to work with JSoup to parse HTML documents, I went through the turorial on JSoup and found that the select-Method might be what I am looking for.
What I try to accomplish is to find all elements in a html document which possess a certain class. To test that, I tried this with the amazon web page (idea: find all deals with certain offers).
So I inspected the web page to see which classes and ids are being used and then I tried to integrate this into a small code snippet. In this example I found the follwing element:
<span id="dealTitle" class="a-size-base a-color-link dealTitleTwoLine restVisible singleCellTitle autoHeight">PROCAVE Matratzen-Brücke aus Schaumstoff 25 x 200 cm für ...</span>
This element is embedded in other elements and exists multiple times (for each deal of course). So here is my code to read the deal elements:
Document doc = Jsoup.connect("https://www.amazon.de/gp/angebote/ref=gbph_ftr_s-8_cd61_page_1?gb_f_LD=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL%252CUPCOMING,dealTypes:LIGHTNING_DEAL,page:1,sortOrder:BY_SCORE,dealsPerPage:8&pf_rd_p=425ddcb8-bed4-4e85-ac0f-c1a79d14cd61&pf_rd_s=slot-8&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=A3JWKAKR8XB7XF&pf_rd_r=BTHRY008J9N3N5CCMNEN&gb_f_second=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,dealTypes:COUPON_DEAL,page:8,sortOrder:BY_SCORE,dealsPerPage:8").timeout(0).get();
Elements deals = doc.select("span.a-size-base.a-color-link.dealTitleTwoLine.restVisible.singleCellTitle.autoHeight");
for (Element deal : deals) {
if (deal.text().contains("ItemMatch")) {
System.out.println("Found deal: " + deal.text());
}
}
Unfortunately I can't get the element I am looking for. deals has always the size of 0. I tried to modify my select with only part of the classes, I added the id-attribute and so on. Nevertheless, I do not get the elements (in this case these are nested into some others). If I try an element which is above this element in the DOM hierarchy (e.g. the div with class "a-section a-spacing-none slotContainer"), this is found.
Do I actually need to specify the whole DOM hierarchy (by using ">" in my select expressions? I expected to be able to define a selector and JSoup would travers and search the whole DOM-tree.
No, you do not have to specify the full DOM hierarchy. Your test should work, if the elements are really part of the DOM. I suspect that they might not be part of DOM as it is loaded be JSoup. The reason might me, that the inner DOM nodes are filled by JavaScript through AJAX. JSoup does not run JavaScript, so dynamically loaded parts of the DOM are not accessible. To achieve what you want you can either look into the AJAX calls directly and analyze them, or you move on to another solution like selenium webdriver, which runs a real browser including a working JavaScript engine.
When a browser receives the initial root HTML page, how does it determine exactly which other objects should be requested. Is there a list of HTML tags that the browser will always request the associated content from the server when they are detected?
I realize the need to implement an HTML parser for this, however I am not sure of all the individual tags and attributes that are important.
Browsers parse the HTML, and know which elements (with which attributes) require additional resources to be loaded.
i.e. They implement an HTML parser.
Is it possible to use the HAP (HTML Agility Pack) to:
Grab a collection of nodes e.g. all <a> elements which are children of <li> elements
Iterate over the collection
Add CSS class references into the class attribute for each element e.g. class &= "foo"
Update the nodes in their original position within the HTML
For point 4, I need to know whether:
When I grab a collection of nodes, am I working with copies?
If so, can I easily update the nodes in their original position within the HTML
Finally, would it be practical to do this when rendering a page in an ASP.NET website, considering:
I will need to modify the class references for no more than 100
elements
I am not working with large HTML documents
I plan to select my nodes starting at a div e.g. div[2] where body
contains 4 divs
I realise that this may seem like a bunch of separate questions but really it is just a breakdown of the following two questions:
Can I easily modify the HTML output of an ASP.NET page e.g. to insert class references?
Would it be practical to do this on 50 - 100 elements WRT speed e.g. no more than 2 seconds cost?
Many thanks.
Check out my CsQuery project: https://github.com/jamietre/csquery or on nuget as "CsQuery".
This is a C# (.NET 4) port of jQuery. Selectors are orders of magnitude faster than HTML Agility Pack; in fact, my initial purpose in writing it was to do exactly what you want to do: manipulate HTML in real time. As it happens, from a CMS with html generated by ckeditor.
To intercept HTML in webforms with CsQuery you do this in the page codebehind:
using CsQuery;
using CsQuery.Web;
protected override void Render(HtmlTextWriter writer)
{
// the CsQueryHttpContext object is part of the CsQuery library, it's a helper
// than abstracts the process of intercepting base.Render() for you.
CsQueryHttpContext csqContext =
WebForms.CreateFromRender(Page, base.Render, writer);
// CQ object is like a jQuery object. The "Dom" property of the context
// returned above represents the output of this page.
CQ doc = csqContext.Dom;
doc["li > a"].AddClass("foo");
// write it
csqContext.Render();
}
There is basic documentation on GitHub, but apart from getting HTML in and out, it works pretty much like jQuery. The WebForms object above is just to help you handle interacting with the HtmlTextWriter object and the Render method. The general-purpose usage is very simple:
var doc = CQ.Create(htmlString);
// or
var doc = CQ.CreateFromUrl(url);
.. do stuff with doc, a CQ object that acts like a jQuery object
string html = doc.Render();
Don't do that ! ASP.NET is not meant to be used that way, there is a better ways to do this task depending on how do you create that markup in witch you want change or add css classes. ASP.NET uses aspx templates, basically html markup and there you can intervene with code executing on server, here you can set css class statically or use server side scripts to set css class on markup with some code.
You can also create controls in code behind and set css to controls if anchor control have parent that is list item control (you will have to use server side controls).
To do it your way you will have to make Response Filter (example here) and after request is done do your parsing and write results and changes back to response stream. It's much easier using common ASP.NET techniques.
I always hear people talk about DOM this, manipulate the DOM, change the DOM, traverse the DOM; but what exactly does this mean?
What is the DOM and why would I want to do something with it?
The DOM is basically an API you use to interface the document with, and is available in many languages as a library ( JS is one of those languages ). The browser converts all the HTML in your web page to a tree based on the nesting. Pop open Firebug and look at the HTML structure. That is the tree I'm talking about.
If you want to change any HTML you can interact with the DOM API in order to do so.
<html>
<head><script src="file.js"></script></head>
<body>blah</body>
</html>
In file.js I can reference the body using:
onload = function() {
document.getElementsByTagName('body')[0].style.display='none';
}
The getElementsByTagName is a method of the document object. I am manipulating the body element, which is a DOM element. If I wanted to traverse and find say, a span I can do this:
onload = function() {
var els = document.getElementsByTagName('*');
for ( var i = els.length; i--; ) {
if ( els[i].nodeType == 1 && els[i].nodeName.toLowerCase() == 'span' ) {
alert( els[i] )
}
}
}
I am traversing the nodeList given back by getElementsByTagName in the snippet above, and looking for a span based on the nodeName property.
It means working with the Document Object Model, which is an API to work with XML like documents.
From w3 on the DOM:
The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page. This is an overview of DOM-related materials here at W3C and around the web.
One of the functions mostly used in DOM work is:
getElementById
Manipulating/Changing the DOM means using this API to change the document (add elements, remove elements, move elements around etc...).
Traversing the DOM means navigating it - selecting specific elements, iterating over groups of elements etc...
In short:
When a web page is loaded, the browser creates a Document Object Model of the page, which is an object oriented representation of an HTML document, that acts as an interface between JavaScript and the document itself and allows the creation of dynamic web pages.
Source: w3schools - HTML DOM
D ocument
O bject
M odel
This is the DOM. Either an XML, or HTML, or similar document. All of those terms mean to parse the document and/or make changes to it (usually by using some available tools like JavaScript or C#).
The best example of a DOM when people use those terms is the HTML document in a browser. You might want to manipulate the DOM in this case to add something to the web page.