rel-canonical prev/next when head can not be written - html

Were working in a component(module)/template framework.
There is only one template per page, this defines the basic structure and layout. the HEAD area is defined here.
Now, many of our components(modules) include some concept of pagination.
Thus, it's desired to use rel-next and rel-prev in the head of the document.
The problem comes from the template is (and cannot) be aware of the component that provides pagination. They are 100% completely decoupled.
Once the component is run, the head part of the page is typically flushed already.
It's just a limitation of the framework.
since placing the links in the BODY (where the component(module) renders) will not achieve the correct results (i.e. Google ignores it unless in the head).
Can anyone think of an approach or work-around to this issue?

You could send HTTP Link headers instead:
Link: <http://www.example.com/favorite-books/everything-on-one-page>; rel="canonical"
Link: <http://www.example.com/favorite-books/page-1>; rel="prev"
Link: <http://www.example.com/favorite-books/page-3>; rel="next"
According to Google’s documentation about their usage of canonical, it’s supported.
While support isn’t mentioned on Google’s documentation about their usage of prev/next, support was confirmed in a thread on their product forums.

According to Google the default option is to nothing - "Leave whatever you have exactly as-is. Paginated content exists throughout the web and we’ll continue to strive to give searchers the best result, regardless of the page’s rel=”next”/rel=”prev” HTML markup—or lack thereof."
Other more up to date mentions of the issue suggest the advice hasn't change:
Video about pagination
Infinite scroll

Related

Link: Response Header VS HTML

I am currently working on a function to assist in preparing Link: HTTP header or a set of <link> tags and while reading different materials on this, I still am not able to find an answer to simple question: when to use Link: header and when to use <link>.
So far I can only say, that if you want to use HTTP20 server push, it is recommended to utilize the header. On the other hand, even if I push a stylesheet, it will not be applied unless there is a respective tag in HTML output.
Since I am preparing the library in order to help with some standardization and sanitization, I would like to catch, at least, some "weird" cases like this, if it's possible, but for that I need some set of recommendations or best practices in that regard. Sadly I am unable to find any thus far, so am turning to more knowledgeable people: what best practices or weird cases should I consider catching or should I just allow whatever to be sent regardless of whether it's a header or a tag?
If anyone is interested, the code is present in https://github.com/Simbiat/HTTP20/blob/main/src/Headers.php (links function).
They are supposed to be equivalent as #Evert states so in theory you can use either. However there are some considerations:
Headers are usually set in web server config (at least for static pages) which may not be as easy to update for developers.
However it has the added advantage that you can set these for multiple pages all at once (e.g. preload your core fonts on every .html file, rather than having to remember to set this on all pages, or all page templates if using a CMS).
On the other side with the HTML version it’s often easier to configure it per page (or page template), if you have different needs (e.g. different fonts are used in different pages).
There’s also some which say there are slight performance considerations to doing it in the header but honestly, as long as it’s high enough in the <HEAD> element I really think you’d struggle to notice this.
Of perhaps of more importance is whether it’s passed on hop to hop if your web server is hidden behind other infrastructure (e.g. a CDN or other proxy). In theory it should be, for simple headers, but for things like HTTP/2 push that’s not so easy. If it’s in the HTML you don’t need to worry about this (assuming intermediaries are not changing the markup of course!).
You mentioned the HTTP/2 push use case and that definitely needs the header (though this is not a defined standard method of setting push and some servers or CDNs use other methods, but many use this). However given HTTP/2 push’s complexities and concerns it can cause more problems than it solves, this is maybe a reason to recommend the HTML method to ensure it’s never pushed.
All in all I recommend setting this in the HTML. It’s just easier.
This is not the case however with other, similar things, which can be set in HTML and HTTP headers. CSP for example is limited in the HTML version, lacking some features of the HTTP Header version, and is also not recommended as it could be altered with JavaScript whereas the HTTP header cannot. But for simple Link headers these are less of a concern.

CSS selector fragment support

Fragments (or hashes) in URL are widely used to specify some specific fragment in a document.
For example, the fragment below
http://example.com/page.pnp#<fragment>
Usually references something like <div id="<fragment>" /> or <a name="<fragment>" /> in a HTML document.
There is a standard to support CSS selectors as a fragment, like so:
http://example.com/page.pnp#css(<CSS selector>)
Are there any applications using it? Would it be nice for browser to support it? For example, browser could display only the selected fragments of the page or highlight the selected fragments. Or provide an option for developers to highlight the selected fragments with CSS or JS. Can somebody submit it to the relevant browser devs as a feature request?
What are other ways to reference specific content in a HTML page? For example, if I want to comment on some specific element in a HTML page, what are any other ways to specify that position in the document, preferably by using URI, or some other convenient identifier?
The document you link to isn't really a standard; it even says "Unofficial Draft" in the subtitle, and under where it says Status:
This document is merely a public working draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.
... so it is completely inappropriate to refer to it as a "standard". A better term for this would be "concept" or "experiment".
That being said, rudimentary implementations exist (or at least, they did at the time it was first published) in the form of browser extensions; you can find links to these in section 8.
AFAIK, though, there hasn't been any activity around this at all after the first few months since the community group for this was formed and I joined. Either it never gained traction or it just wasn't very feasible to implement after all.
For now, as always, fragment identifiers can only point to elements with the respective id attribute, or named anchors. It seems it'll remain that way for the foreseeable future.
For Chrome, There's a Jquery Fragement selector extension:
With the advent of edge extensions it will likely become easy to implement in edge.

Optimize CSS Delivery - a suggestion by Google

Google suggests to use very important CSS inline in head and other CSS inside <noscript><link rel="stylesheet" href="small.css"></noscript>.
This raises few questions in my mind:
How to prioritize CSS in two files. Everything for that page looks important. Display, font etc. If I move it to bottom then how it helps page render. Wont it cause repaint, etc?
Is that CSS is required after Document ready event? Got it from here.
How 'CSS can' go inside <noscript></noscript>, which is for script? Will it work when JavaScript is enabled? Is it browsers compatible?
Reference
Based on my reading of the link given in the question:
Choose which CSS declarations are inlined based on eliminating the Flash-of-Unstyled-Content effect. So, ensure that all page elements are the correct size and colour. (Of course, this will be impossible if you use web-fonts.)
Since the CSS which is not inlined is deferrable, you can load it whenever makes sense. Loading it on DOMContentReady, in my opinion, goes against the point of this optimisation: launching new HTTP requests before the document is completely loaded will potentially slow the rest of the page load. Also, see my next point:
The example shows the CSS in a noscript tag as a fallback. Below the example, the page states
The original small.css is loaded after onload of the page.
i.e. using javascript.
If I could add my own personal opinion to this piece:
this optimisation seems especially harmful to code readability: style sheets don't belong in noscript tags and, as pointed out in the comments, it doesn't pass validation.
It will break any potential future enhancements to HTTP (or other protocol) requests, since the network transaction is hard-coded through javascript.
Finally, under what circumstances would you get a performance gain? Perhaps if your page loads a lot of initially-hidden content; however I would hope that the browser itself is able to optimise the page load better than this hack can.
Take this with a grain of salt, however. I would hesitate to say that Google doesn't know what they're doing.
Edit: note on flash-of-unstyled-content (abbreviated FOUC)
Say you a block of text spanning multiple lines, and includes some text with custom styling, say <span class="my-class">. Now, say that your CSS will set .my-class { font-weight:bold }. If that CSS is not part of the inline style sheet, .my-class will suddenly become bold after the deferred loading has finished. The text block may reflow, and might also change size if it requires an extra line.
So, rather than a flash of totally-unstyled content, you have a flash of partly-styled content.
For this reason you should be careful when considering what CSS is deferred. A safe approach would be to only defer CSS which is used to display content which is itself deferred, for example hidden elements which are displayed after user interaction.

Embed sandboxed HTML on a webpage + SEO

I want to embed some HTML on my website... I would like that:
SEO: that content can be crawled and indexed
Integration: it renders nicely (does not break my DOM trees for instance, or does not inherit my styles)
Security: it remains safe for our user (javascript disabled)
Flexibility: the HTML can be completely free (don't want any BBCode or MarkDown or even TinyMCE, it's our users that are writing the HTML code...)
I saw that I might be able to use the IFrame for that, but I am not sure it is a very good solution concerning my SEO constraint.
Any answer would be greatly appreciated!!! Thanks.
For your requirements (rendering and security, primarily), IFRAME seems to be your only option, especially when we consider no rules are specified for the HTML content except the JS removal. Even some CSS + 'a' tag can bring a serious security risk, like overlaying outgoing links on your standard interface.
For the SEO part, you can use SEO maps to show the search engines the relation between the content and the container, also use html tags like link to make connection.
To make sure the user's html is safe then you should use HTMLPurifer. In terms of the rest of the question, you should split this up into multiple questions.

What rel values have concrete benefits in a <link> tag?

The <link> tag is most commonly used to associate stylesheets with HTML documents, but as many know, it has a myriad of other uses as well. In general it represents some relationship between two documents. Its beauty and curse is that anyone can make their own rel values (relationship types) if they wish so. W3C has listed some possibilities, other people have invented even more, and if I want to there's nothing stopping me from adding a <link rel="unicorns>" to my webpage. It will even validate.
However adding random <link> tags to a webpage only wastes bandwidth. What I want to know which rel values actually provide some functionality. And not just some hypothetical functionality that some future user agent might implement, but actual concrete benefits that my users can feel today.
A few that I already know are:
stylesheet - the most common use, of course. Attaching CSS stylesheets.
canonical - instructs Google (and other search engines) where the "normal" or "canonical" URL of the page is (in case you can view the same page by many URLs);
icon - specifies the favicon which browsers show in the URL bar and next to bookmarks.
home, index, contents, search, glossary, help, first, start, prev, previous, next, last, up, copyright, author - These appear in Opera's navigation bar and (I'm told) in SeaMonkey plugin for Firefox. Additionally, Firefox preloads in background the <link rel="next"> page and Opera navigates there when you press SPACE at the bottom of current page. I believe I might have seen them in Opera Mini as well, but I'm not sure (does anyone know how these affect mobile browsers?)
pingback - used to implement Pingback in blogs.
Are there any other possible <link> tags that actually do something? (Please also include what they do in your answer)
"alternate", at least as a modifier to "stylesheet" does something, in that the stylesheet will initially be in the "disabled" state.
<link rel="prefetch"...> will convince some browsers to preload the page linked to. Used correctly, it can greatly reduce perceived load times along the expected path through your site.
Firefox (for one) uses this today, and Google actually utilizes it. (The first search result, particularly if it's to a common site like Wikipedia, will often include a "prefetch" link to the page found.)