I have a web page as a simple string. I'm trying to make an HTMLDocument out of it using the HTMLFile COM object:
$page = #'
<!DOCTYPE html><html lang="en">
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<title>A Title</title>
<script src="/index.js"></script>
</head>
<body>
<div class="findme">Hello</div>
<div>Nope</div>
</body>
</html>
'#
$bytes = [System.Text.Encoding]::Unicode.GetBytes($page);
$html = New-Object -Com "HTMLFile";
try {
# Powershell 4
$html.IHTMLDocument2_write($bytes);
} catch {
# Powershell 5
$html.write($bytes);
}
write $html.documentElement.innerHTML;
As written, this outputs
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>A Title</title>
<script src="/index.js"></script></head>
Note that the entire <body> tag is gone. If I change the meta tag from IE=edge (or IE=11) to IE=9, the output is correct and includes the <body> tag and all its contents. The crazy part is that it works with IE=edge if I take out the <script> tag, or if it's an inline script rather than a src= reference.
Did I do something wrong? The HTML passes the W3C validator so I don't think it's faulty markup. Is there a different mode I can set on the document before the call to .write() to have it parse successfully even with the IE=edge meta-tag?
PS: I wish this wasn't relevant, but it might matter that I don't have Office installed on this machine, because I think that would provide a different implementation for the COM object. Win10 18363, Powershell 5.1.
Related
I want to have a header include in my HTML pages to avoid changing any page in case I have header changes as the header can grow easily.
Problem:
Expected textoutput : "Schüttgut"
Received textoutput: "Schüttgut"
Here is what I did:
Creating a header - saved as inc_head.html:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
Including w3 at the end of the index.html page
<script src="https://www.w3schools.com/lib/w3.js"></script>
<script>
w3.includeHTML();
</script>
Setting the head-part in the index-page like
<head>
<div w3-include-html="inc_head.html"></div>
</head>
Opening the page shows that the UTF-coding is not working, I receive "ü" signs instead of "ü".
When I replace the w3-include part directly by the two header lines it works correctly.
Any idea why the include fails for this special-chars?
Thank you
I'm just trying to post a simple html file consisting mainly of some prose I wrote inside of <pre> elements.
Interestingly, when I view the file on my computer with my browser, the quotation marks display fine. But when I upload it to my website, quotation marks are rendered as something like “ or â€. I have looked around the web for solutions but they were few and in between.
I tried to use the meta tag and included
<meta http-equiv="Content-Type" content="text/html; charset="utf-8" />
to my header but to no avail. Any ideas on how to solve this? It just wouldn't make sense to go back to the content inside the elements and code it into html as the prose is a draft and will go through many changes in the future.
The <!doctype html> tag indicates the file is HTML5 - so the browser will render it as such. lang="en" should be set to the language you are working with. Be sure to use the <meta charset="utf-8"> tag to set the character set in the <head>
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Template</title>
</head>
<body>
<pre>This is my stuff</pre>
</body>
</html>
Check your code with the browser's View Source and use the Validator at https://validator.w3.org/ to check the page.
Here what I tried.
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
<pre>Einstein said,"Once you stop learning, you start dying"</pre>
</body>
</html>
I also tried only this
<body>
<pre>Einstein said,"Once you stop learning, you start dying"</pre>
</body>
Still working
I have created an Angular Universal app that consists of nested components.
I need Prerender.io to return the Server-Side-Rendered HTML from my app but Prerender.io only returns the top level component tags and not the HTML within.
It seems that this happens to every Angular app (SRR or not). I would expect it to happen on a none-SSR page but not on SSR.
I followed the guidelines from prerender.io:
git clone https://github.com/prerender/prerender.git
cd prerender
npm install
node server.js
And then tried to go to:
http://localhost:3000/http://localhost:4000/test
The returned html I get is:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Documents SSR</title>
<base href="/">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="icon" type="image/x-icon" href="favicon.ico">
<link rel="stylesheet" href="styles.3ff695c00d717f2d2a11.css">
<style></style>
</head>
<body>
<app-root _nghost-sc0="" ng-version="8.2.3" _nghost-serverapp-c0="">
<router-outlet _ngcontent-serverapp-c0=""></router-outlet>
<component1>
<!---->
<component2></component2>
</component1>
</app-root>
</body>
</html>
I would expect Prerender.io to render everything within component2 but as you see, I get nothing.
Am I missing a configuration somewhere or doesn't Prerender.io support Angular?
If I go directly to http://localhost:4000/test I get the full rendered HTML.
EDIT
It could look like Prerender.io doesn't wait for my last ajax call to be done before rendering?
Found the issue - don't use ShadowDom.
My component2 was using encapsulation: ViewEncapsulation.ShadowDom.
Removing that let Prerender give me the entire HTML.
I would like to stop a few of my pages from showing up in search results. My understanding is that I add the following to the <head> section of the page:
<meta name="robots" content="noindex,nofollow"/>
The problem is that my pages use a common Layout page. Something like:
#{
Layout = "~/Views/Shared/_VanillaLayout.cshtml";
}
Inside the layout page is the head section with a whole lot of links, scripts and meta tags. I don't want to duplicate this for indexable and non-indexable pages.
From my research I have found that: -
Having multiple <head> sections is bad.
Having the robot meta tag outside of head is bad.
Using robots.txt is more than I want and is bad.
Trying to pass a model into the layout is a bit of an overkill (need all models to inherit from some base and many pages are purely presentation and don't even have a model) and is bad.
Hopefully, I am missing something and there is a good (non-bad) way to do this or one of the approaches I have mentioned above is not so bad after all.
It seems to me the easiest way would be to define a section in the <head> tag of your layout file that you can choose to populate with data in your views
<head>
<meta charset="utf-8" />
<title>#ViewBag.Title - My ASP.NET MVC Application</title>
<link href="~/favicon.ico" rel="shortcut icon" type="image/x-icon" />
<meta name="viewport" content="width=device-width" />
<!-- Adding a RenderSection here, mark it as not required-->
#RenderSection("AdditionalMeta", false)
#Styles.Render("~/Content/css")
</head>
Now, in any view in which you need to add additional meta data, simply add the following code at the end/beginning (after model declarations) of your view file
#section AdditionalMeta
{
<meta name="robots" content="noindex,nofollow"/>
}
Since all of the Razor stuff is processed server side, there would be no issues in a) having JS append items given that some crawlers do not implement JS and b)no late appending to <head> tag/etc. Also, being marked as not required means that you only have to update the pages that you want to not be indexed and not have to set a variable on every single page in your application.
You can add the following conditional with the meta-tag to the <head> element in your common layout:
<!DOCTYPE html>
<html>
<head>
<title>#ViewBag.Title</title>
#if (PageData["DisableIndexing"])
{
<meta name="robots" content="noindex,nofollow"/>
}
...
</head>
<body>
...
</body>
That flag will be set as disabled by default in your main _ViewStart.cshtml file, the one in the Views folder. That would mean by default no page will add that meta tag. This will be the _ViewStart file:
#{
Layout = "~/Views/Shared/_VanillaLayout.cshtml";
PageData["DisableIndexing"] = false;
}
Finally, on pages where you want to disable indexing you just need to override that flag. For example if the Foo view should not allow indexing, you would do:
#model MyNamespace.MyFooModel
#{
ViewBag.Title = "Foo";
PageData["DisableIndexing"] = true;
}
...
If all the views within a certain folder should disable indexing, you could even add another _ViewStart.cshtml file to that folder where you would just set PageData["DisableIndexing"] = true;
As a side note, you could also use the ViewBag to pass data from the _ViewStart to the layout, but code is a bit ugly as you don't have direct access to the ViewBag in the ViewStart. See this answer if you prefer to use the ViewBag.
If you are not defining any meta tag in Layout page and you simply want to add from your Page then you can do as following.
in your layout page _VanillaLayout.cshtml under head section use #RenderSection as following
<head>
<meta charset="utf-8">
#RenderSection("SeoRender", false)
</head>
Now in your view page do following way
#{
Layout = "~/Views/Shared/_VanillaLayout.cshtml";
}
#section SeoRender{
#{
<title>testTitle</title>
<meta name="keyword" content="testkeyword">
<meta name="description" content="testdescription">
<meta name="author" content="testauthor">
}
So this you can define specifican meta tag and other thing individually in your page.
Try with Jquery, in the page that you don't want to be indexed, add
$('head').append('<meta name="robots" content="noindex,nofollow"/>');
Edit:
another try could be (according to this Will Googlebot crawl changes to the DOM made with JavaScript? ) to try with simple javascript instead of jquery library
document.getElementsByTagName('head')[0].appendChild('<meta name="robots" content="noindex,nofollow"/>');
Old question, but if it may helps someone, in the upper Layout, I had to use :
<head>
#RenderSection("MySection", required:false)
</head>
Then, in every nested Layouts, I had to redefine my section :
#section MySection {
#RenderSection("MySection", false)
}
Finally, I defined my section in my .cshtml view :
#section MySection{
<meta name="robots" content="#Model.MetaContent"/>
#* or any other tag going to <head> *#
}
Again, old question, but I figured out a very simple way to do this:
Define some meta properties:
<meta property="og:type" content="website"/>
<meta property="og:url" content="https://illusive.azurewebsites.net/"/>
<meta property="og:image" content="https://illusive.azurewebsites.net/favicon.ico"/>
Define your custom properties that you want to be different on each page (In your parent _Layout.cshtml):
<meta property="og:title" content="#ViewData["Title"]"/>
<meta property="og:description" content="#ViewData["Description"]"/>
Define the ViewData elements for your different layouts:
(inside Index.html)
#model IndexModel
#{
ViewData["Title"] = "Homepage";
ViewData["Description"] = "Home Page";
}
(inside Error.cshtml)
#model ErrorModel
#{
ViewData["Title"] = "Error";
ViewData["Description"] = "An error has occurred!";
}
This should act as a way to easily customise your meta tags without having to use sections or any tedious blocks of code.
I am using DreamWeaver to code xHtml docs. in the program the code is valid but when I upload it in the inspect element I see double <head> tags and when I right-click to see the source file it seems o.k.
Is it because I'm using dreamweaver? what can be wrong?
the first error is : "Extra <html> encountered. Migrating attributes back to the original <html> element and ignoring the tag." - in line 3
The code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="keywords" content="the content of my doc" />
<meta name="description" content="this is an example document" />
<link rel="alternate" type="application/rss+xml" title="rss feeds" href="linkto/xml/feeds.xml" />
<!-- scripts -->
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<title>The Title</title>
</head>
<body>
<!-- content -->
</body>
</html>
Thank you very much.
No problem in Chromium 5.0.307.9 (Developer Build 39052) under Linux. I can't test it in Safari now.
EDIT: Proposed test case had nothing to do with this problem, neither could see any extra <head> tags. However, I looked at the Developer Tools of Safari and Chrome under Windows and Firebug in Firefox and all three rendered the DOM incorrectly. Just have a look at this picture and see that the first <link> tag has jumped into the body.
This problem also has nothing to do with Javascript because when turning off Javascript the result is the same, even more clear when comparing with the source code. Strange I didn't notice this under Linux.
The Developer Tools of the WebKit browsers give an even clearer picture (also notice the jQuery error message). I suspect the Unicode Byte-Order Mark (BOM) at the beginning of the file causing the problem: as you can see the BOM is moved to the <body> of the document, perhaps dragging several elements in the <head> with it. But also the unclosed <link> elements, as shown by the W3C validator, might give some issues, although browsers usually handle this without any problems. First get rid of the BOM in your file and see if the problem persists.
And I see another error: those tags beginning with <meta ... are called meta tags, not "meat tags". ;-)
You should have a title element what you write between
the <title></title> tags will been displayed in top bar of your browser
Just make sure your
</head>
tag has the slash in the actual file you're working on. That's an easy typo.
To remove BOM from your document, you can use this php function:
function removeBOM($str=""){
if(substr($str, 0,3) == pack("CCC",0xef,0xbb,0xbf)) {
$str=substr($str, 3);
}
return $str;}