What is a Document Base URL and Fallback Base URL? - html

I would like to ask the community to help me understand what is a Document Base URL and Fallback Base URL in terms of how they are defined in the HTML5's specification. Please note I would prefer to expect a more understandable definition in terms of the definitions in the specification. However, individual perceptions are also welcome.
Link for Document Base URL's definition.
Link for Fallback Base URL's definition.
To me the definition for these two in the HTML5's specification kind of looks like having a circular reference.

You obviously need to understand recursion to understand recursion... ;) – These kinds of specifications are often self-referential. In the end they're very specific steps described in excruciating detail; they're essentially pseudo-code written in English as the programming language. You just need to follow them step by step to get to the result. One part referring to another term is like calling a function in code; they may even call each other, as long as there's no infinite loop in the end that's fine.
In this case it's not really that bad. The fallback base URL describes how to resolve a document's URL assuming cases where it may be a child of another document like an iframe, in which case it falls back to the parent's URL.
The document base URL resolves a document's URL taking <base> elements into account.
In summary, the specification is:
The document base URL is either the document's URL or the appropriately resolved <base> URL.
If the document is an iframe, take the parent's document base URL (see above).
Otherwise if it's about:blank but has a parent, take the parent's document base URL (see above). (This is a real niche case, but required for completeness.)
Otherwise it's the documents's address.
If the specification is henceforth talking about the document base URL, it just means step 1., the document's own URL, possibly resolved against <base>. If the specification is talking about the fallback base URL, follow all these steps.

Related

What's the difference between the two css link below, and what are the purposes of the two links? [duplicate]

I just learned from a colleague that omitting the "http | https" part of a URL in a link will make that URL use whatever scheme the page it's on uses.
So for example, if my page is accessed at http://www.example.com and I have a link (notice the '//' at the front):
Google
That link will go to http://www.google.com.
But if I access the page at https://www.example.com with the same link, it will go to https://www.google.com
I wanted to look online for more information about this, but I'm having trouble thinking of a good search phrase. If I search for "URLs without HTTP" the pages returned are about urls with this form: "www.example.com", which is not what I'm looking for.
Would you call that a schemeless URL? A protocol-less URL?
Does this work in all browsers? I tested it in FF and IE 8 and it worked in both. Is this part of a standard, or should I test more browsers?
Protocol relative URL
You may receive unusual security warnings in some browsers.
See also, Wikipedia Protocol-relative URLs for a brief definition.
At one time, it was recommended; but going forward, it should be avoided.
See also the Stack Overflow question Why use protocol-relative URLs at all?.
It is called network-path reference (the part that is missing is called scheme or protocol) defined in RFC3986 Section 4.2
4.2 Relative Reference
A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target URI, is obtained by applying the reference resolution
algorithm of Section 5.
A relative reference that begins with two slash characters is
termed a network-path reference (emphasis mine); such references are rarely used.
A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative- path reference.

Replace iframe.src attribute with javascript comment holding value

I am looking for a way to replace the content of the src attribute for an iframe with a dummy variant containing the original src value (but will not actually fetch anything). I am loading the html code via Ajax so I can change the src-attribute before the code is injected into the DOM - so I don't need help with that part. What I would appreciate feedback on is what to put in the src attribute. There is a related post here discussing what can go in the src attribute, but in contrast to this post, I want to store data (namely the original src value) so that I can extract it later. It seems the alternatives are:
src="javascript:/*http://originalsrcvalue.com*/"
src="about:blank/*http://originalsrcvalue.com*/"
src="#http://originalsrcvalue.com"
I am leaning towards the last variant using bookmarks. I'm looking for feedback on potential problems or cross-browser issues that might arise or suggestions for alternative solutions.
Edit: One way of addressing the problem is to use custom attributes - and this is probably what I'll end up using in this specific case. However, I would also like feedback on ways to store data in src-tags in the fashion showed above.
You could store the actual URL to a data-your-data-name attribute and fetch it with Javascript when you need it, by doing element.getAttribute('data-your-data-name') or if you don't care much about IE users, by element.dataset.yourDataName
References:
https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_data_attributes
https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/dataset

Is it possible to nest one data: URI inside another?

If I use a data URI to construct a src attribute for an HTML element, can it in turn have another data URI inside it?
I know you can't use data uri's for iframes (I'm actually trying to construct an OSDX document and pass it to the browser with an icon encoded in base64 but that's a really niche use case and this is more of a general question), but assuming you could, my use case would look like:
var iframe = document.createElement('iframe');
var icon = document.createElement('image');
var iSrc = 'data:image/png;base64,/*[REALLY LONG STRING]*/';
iframe.src='data:text/html,<html><body><image src="'+iSrc+'" /></body</html>
document.body.appendChild(iframe);
Basically what I'm after is is there anything in a data uri that would break a parent data uri?
Yes you can. I really thought it was impossible, as did everyone I asked.
Example:
Pasting the following into your browser's URL bar should render a gmail logo in an html page that says hello world.
data:text/html,<html><body><p>hello world</p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAAz1BMVEX///+6xs7iTULj5+vW2tzk49TX3+Tq6tz09Orz9/jm7O6wu8aNm6WwMCjfRDnm5ube2Mmos7vt7++jrLP29/eyTk3d4+fj6+foWFLM1t2bNzeZpa2trKPv8/ajo6Xud2f0r6rv7uG3ubvd6eXIysq9s6jeopfSUknx8fHbXGKvTx7Hz9X09PTwhXHw9Pb++/fp7/Xu4d3SZlPSiXjuaFmMkZX39/vZy6/j8+v2+/v76uP7+/v0/Pewl4raMi/f5+7KTjitf4W2rI3/9vDzz8uC91bdAAAAAXRSTlMAQObYZgAAAmBJREFUeF6VldeunDAQQNeF3jvbe729p/f//6aMxzaBKHe5OZiVJY6OBvbBAyTvZaDJSy/xzpJ4pdJzz1tfOme5LD0vly45fETe4/1vUoK26bnPznUPzrPrmSJsgtrPpQnpPCmvp2/gukyE/HX6JtYor9/kpqUcY3ogvUxv5RhlSrbHenFzY/+0cxtYApZlGZod3W9JaqJspuSR1vTqw41U4UZb+S/zMAz35FbJt4QK6sUnW+naxWwoaJXBpBi37bxTjuchyl+0zLGs4npmVK1dHSLBiLhmlg+chLukFiLqV3dQ1i84p1SEv42CgLhYzmR5vqh1XIUXeypGoN+LAMoVz5yBk/EKZfsOQioOsnGFWXr/eXLCMs9EeehK2bab+NKCbQj2/mEymRSjpqzkR5CXOv4AWSqyM3BnRSDKw255CWtBW0BWcILyCmQsMywv5b/xS8WpzGJZzMywPFZjvANTYO10VoNfIxqOW2WlwhU/QnbSMDu1y0yWpSowdiqry6OArEW5ka1GNTaGsWqViwDL4z/lezClu4nBN+K/ypWSIywrNY4NxaqZWZSjThlEkQXLUna8lTZ+unWnDE6T1fbmtZlBxWxbFvHZ5NR8DUeUj5QejQ1mu24cv2C5aJW3R3qv1a4bX8TbIih+wAv6IPvDKCIrAusV8FEpyz5njFUV/ERdWFTBHVWwmGkyKKPcT2kyjvKFy1jPgnJ6IeTMg30vzCUZHBTc52mvzdKhz+GYcJIxTw8ukOKlVmd/SPk4kSdQ4nv8fJh7PriIw7Mn/yxPGW8dm06e269eOTwe/De/AW24fWb7B21TAAAAAElFTkSuQmCC" /></body></html>
or for a shorter example courtesy of Pumbaa80:
data:text/html,<script src="data:text/javascript,alert('hello world')"></script>
MSDN explicitly supports this:
Data URIs can be nested.
An old blog entry talks a little bit more about embedding images within CSS using data: :
Neither dataURI spec nor any other mentions if dataURI’es can not be nested. So here’s the testcase where dataURI’ed CSS has dataURI’ed image embedded. IE8b1, Firefox3 and Safari applied the stylesheet and showed the image, Opera9.50 (build 9613) applies the stylesheet but doesn’t show the embedded image! So it seems that Opera9 doesn’t expect to get anything embedded inside of an already embedded resource! :D
But funny thing, as IE8b1 supports expressions and also supports nested data URI’es, it has the same potential security flaw as Firefox does (as described in the section above). See the testcase — embedded CSS has the following code: body { background: expression(a()); } which calls function a() defined in the javascript of the main page, and this function is called every time the expression is reevaluated. Though IE8b1 has limited expressions support (which is going to be explained in a separate post) you can’t use any code as the expression value, but you can only call already defined functions or use direct string values. So in order to exploit this feature we need to have a ready javascript function already located on the page and then we can just call it from the expression embedded in the stylesheet. That’s not very trivial obviously, but if you have a website that allows people to specify their own stylesheets and you want to be on the safe side, you have to either make sure you don’t have a javascript function that can cause any potential harm or filter expressions from people’s stylesheets.

Why Does this relative URL work?

So I've been playing with writing some web crawlers and testing them on different sites. But I've come across some sites that seem like their relative urls should not work, or at least I think they should point to someplace other than where the browser resolves them to.
Given a url of a current page : "http://www.examplesite.com/a/page.htm"
And a link of: "a/page2.htm"
The browser correctly resolves this as: "http://www.examplesite.com/a/page2.htm"
My problem/feeling (obviously wrong, but I'm wondering why) is that this should resolve to "http://www.examplesite.com/a/a/page2.htm". The relative url does not begin with a /, so why does it become base relative?
Interestingly, Java's URL class appears to agree with me, as the following code will output : "http://www.examplesite.com/a/a/page2.htm"
URL baseUrl = new URL("http://www.examplesite.com/a/page.htm");
URL absoluteURL = new URL(baseURL,"a/page2.htm");
Why does this link resolve the way it does, and what is the formal rule for resolving a relative link like this?
EDIT:
I just notice that in the <head> portion of the webpage there is a field like so:
<base href="http://examplesite.com/">
I'm assuming that this overrides any relative links to use that as its base url instead of the actual url. Is this a correct assumption? Is that even a valid html markup?
You are correct in that it is the base tag, and yes it is valid.
In HTML, links and references to external images, applets,
form-processing programs, style sheets, etc. are always specified by a
URI. Relative URIs are resolved according to a base URI, which may
come from a variety of sources. The BASE element allows authors to
specify a document's base URI explicitly.
When present, the BASE element must appear in the HEAD section of an
HTML document, before any element that refers to an external source.
The path information specified by the BASE element only affects URIs
in the document where the element appears.
Sources: W3C Wiki and W3C Markup
The site is likely using a <base> tag to specify the parent as the prefix to all relative URL's on the site.
You can find out more on the base tag here. If this is not the case, then please provide the source URL as this defies normal behavior.

Link to a section of a webpage

I want to make a link that when clicked, sends you to a certain line on the page (or another page). I know this is possible, but how do I do it?
your jump link looks like this
jump link
Then make
<div id="div_id"></div>
the jump link will take you to that div
Hashtags at the end of the URL bring a visitor to the element with the ID: e.g.
http://stackoverflow.com/questions/8424785/link-to-a-section-of-a-webpage#answers
Would bring you to where the DIV with the ID 'answers' begins. Also, you can use the name attribute in anchor tags, to create the same effect.
Resource
The fragment identifier (also known as: Fragment IDs, Anchor Identifiers, Named Anchors) introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document.
Link to fragment identifier
Syntax for URIs also allows an optional query part introduced by a question mark ?. In URIs with a query and a fragment the fragment follows the query.
Link to fragment with a query
When a Web browser requests a resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent (Web browser) processes the resource according to the document type and fragment value.
Named Anchors <a name="fragment"> are deprecated in XHTML 1.0, the ID attribute is the suggested replacement. <div id="fragment"></div>
If you are a user and not a site developer, you can do it as follows:
https://example.com/index.html#:~:text=foo
Simple:
Use <section>.
and use Visit the Useful Tips Section
w3school.com/html_links