Open Graph share debugger scrapes empty html

Open Graph share debugger scrapes empty html - html

I'm trying to set up Open Graph meta tags for a website. When I access the site normally using a browser and inspect the source, the tags are there. However, they don't show up when I use the OG debugger.
The site that I'm developing is here spurafrika-org.vercel.app (Next.js site). It's replacing the original site here: spurafrika.org (WordPress site).
When I use the See exactly what our scraper sees for your URL feature, I get this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head>
<body><p>ÿþ</p></body>
</html>
See for yourself here. Vastly different from my actual source for my websites.
I originally thought it might have been a Vercel/Next.js issue, but when I discovered it also happening on the WordPress site, I was very confused: see this. I've checked other sites developed with Next.js and WordPress - the meta tags work fine on the debugger.
Another point of confusion is that the debugger tool seems to be able to pick up that on my Next.js site I've listed https://spurafrika.org as its canonical URL, which it can only tell through my og:url tag. Yet when I view what the debugger supposedly sees, it shows the above empty HTML snippet.
I thought perhaps it might have been an encoding or parsing issue, but I've validated my HTML source using several tools and there are no problems.
I'm stumped. Anyone know why this is happening?

I copied your code against a code sandbox - https://developers.facebook.com/tools/debug/?q=https%3A%2F%2Fkzi2c.csb.app%2F
Initially keeping the og:url as https://spurafrika.org/ caused warnings and og:description not getting picked up, once i pointed it to the actual url it all got fixed
Change the og:url to the right url may fix it, give it a shot and let us know.

Related

CSS not displaying properly in Sharepoint on Edge Browser (SEC7111 Error)

Hopefully I can explain this correctly. I have recently been moved to a Windows 10 VM from Windows 7 and I'm trying to get a site for my team at work to display properly in Edge. I have a WebPart linking to CSS that is displaying everything as one large list instead of a table with dropdowns. When I open the HTML page on its own in Edge it displays fine, but with code in SharePoint it is not working correctly. Any ideas of why this could happen?
What should display
What is displaying in SharePoint
EDIT
After opening developer tools I find that I am receiving a SEC7111 error code on my CSS file that is being linked. Looking other places for solutions to this too, but any help is greatly appreciated!
FINAL EDIT
With the SEC7111 error I found out that the "file://" links I used for the CSS weren't going to work because they weren't considered "secure" (Although I got the same error in IE, but never had this display issue..?) So, I moved my linked CSS file to a secure folder in another SharePoint site I have, linked the CSS from there, and now it's working!

There are some ways that you can use to solve your problem (It's better to share your code within your question to get a better answer). So, I offer you below solutions:
Solution 1
Please don't use file:// for the published site in the webserver. HTML rendered on the client so you cannot access the local files. so you should not use the file://. you can read more about security concerns and more details on the file protocol here: (https://en.wikipedia.org/wiki/File_URI_scheme)
Instead of using local file protocol, you can use the Absolute/Relative path to your CSS on the HTTP/HTTPS protocols
Solution 2
Add X-UA-Compatible meta tag or HTTP response header to force IE to run with legacy document mode: 5, 7, 8.
X-UA-Compatible meta tag:
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=8" />
...
</head>
<body>
</body>
</html>
X-UA-Compatible HTTP response header:

Twitter-Card Meta Tag Issue

URL in question: https://www.halleonard.com/viewpressreleasedetail.action?releaseid=10261
When you view the source in a browser and in Developer Tools, you can see all of the meta tags for Open Graph and Twitter. I have checked the Facebook Debugger and, aside from a few canonical issues, I'm fairly happy with the results.
I also plugged the above URL into a third-party Open Graph Debugger:
http://debug.iframely.com/
and all of the tags for Open Graph, Twitter and even other all come back positive.
Why is is that Twitter's Card Validator is coming back with a
INFO: Page fetched successfully
INFO: 9 metatags were found
ERROR: No card found (Card error)
Any insight on how I get Twitter to display properly?

I think you may either be blocking or redirecting the Twitterbot agent.
I faked a Twitterbot agent with curl:
curl -A 'Twitterbot' https://www.halleonard.com/viewpressreleasedetail.action?releaseid=10261 -o ~/Desktop/what-twitterbot-sees.html
And this is what your server returns:
As you can see there are 10 meta tags (if you exclude the <meta charset> one) which is what the card validator indicates and there are no <meta name="twitter:*"> tags at all.
You can reproduce with your browser if you can set a custom user agent string. This is possible with Google Chrome:
I'm pretty sure there's some sort of redirection rule going on either at your web server level or in your application code.
According to developer.twitter.com, the user agent string I have used is correct:
Twitter uses the User-Agent of Twitterbot (with version, such as Twitterbot/1.0)

I checked your source code with w3c validator.
https://validator.w3.org/
it seems that google tag manager is not installed properly: you have to move the google tag manager (noscript) (line 10:13) on top of the body section and absolutely remove it from head section.
You can find here some information
https://support.google.com/tagmanager/answer/6103696?hl=en
and also (more specific) in your tag manager page (like this attached)

The error is coming because of the twitter:card type you have selected. You forgot to add the twitter:image content. These both should be added.
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:image" content="ADD IMAGE URL">
Reference Link

HTML relative URL resolution difference

I am looking at an older asp.net 2.0 web application. In the master page, there are the following tags:
<link href="style/template.master.css" rel="stylesheet" />
<script src="js/prototype.js" type="text/javascript"></script>
When I saw this, I expected some trouble with pages loading from folders within the site. I was not expected there to be a difference in how the relative URLs are being resolved. Below is a screenshot of the resolved URLs from a page in the Admin folder:
I expected the URLs to both be resolved relative to the Admin folder; but as can be seen, the URLs resolved differently. My question is why?
An explanation is fine, but I would really like a reference of the resolution rules that state the difference -- or of a bug that could causing this.
A bit more info:
There are no base tags in either the admin page or master page.
The behavior is the same in both IE 11 (in various compatibility modes) and Chrome 40.
The master page has an XHTML 1.0 Strict doc type.
Thanks

The head section in master pages usually have a runat="server" attribute and there is an 'Automatic URL Resolution in the <head> Section', this fixes the URLs for any link tags, but not for the script tags.
Why? don't know. Here's an article about URLs in Master Pages.
For a reference, you could look at the source, now that asp.net is open source.

disable browser bar when page is loaded

Do you know a way to disable any message bar (Google translate, ff help bar question,...) that appears when loading a site page?
I have noticed that for some site Google translate is not pop up, although they don't use code as <meta name="google" value="notranslate">? Is there any trick in html code, or is depends of html definition:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
That affect the browser bar?

Well, if a browser company decide to 'freak-out' or what-ever do something beside showing html content, you can't do anything but try another browser. Like, if googleChrome add publicity to their 'software' .. you'd need to 'hack' the software in order to remove components from it.
Some softwares offer 'web' extensions such has microsoft IE
<meta http-equiv="imagetoolbar" content="no"> etc etc..
Search the web for 'browser specific meta', it might help.
carry on

What you want to do is check that your document is the top-most frame and if not, 'break free' from being displayed in a frame/iframe of another location. Add this code to your documents in the HEAD section:
<script language="JavaScript" type="text/javascript">
<!--
if (top.location != location) {
top.location.href = document.location.href ;
}
-->
</script>
This won't stop Google Translate from displaying translated version of your page, for example, but it will make your document 'break free' out of the frame that Google sets up (with the ability to change a few settings and such). Hope that's what you were looking for. If you have access to your web server configuration, then also check for how to prevent linking in frame and iframe from other domains for your web server. The Apache code for that would be:
#block frame and iframe linking from other domains
Header always append X-Frame-Options SAMEORIGIN
Not sure all browsers respect such headers, though and you might be forced to use mod_rewrite rules to achieve what you're after.
Cheers!

Why does Chrome incorrectly determine page is in a different language and offer to translate?

The new Google Chrome auto-translation feature is tripping up on one page within one of our applications. Whenever we navigate to this particular page, Chrome tells us the page is in Danish and offers to translate. The page is in English, just like every other page in our app. This particular page is an internal testing page that has a few dozen form fields with English labels. I have no idea why Chrome thinks this page is Danish.
Does anyone have insights into how this language detection feature works and how I can determine what is causing Chrome to think the page is in Danish?

Update: according to Google
We don’t use any code-level language information such as lang
attributes.
They recommend you make it obvious what your site's language is.
Use the following which seems to help although Content-Language is deprecated and Google says they ignore lang
<html lang="en" xml:lang="en" xmlns= "http://www.w3.org/1999/xhtml">
<meta charset="UTF-8">
<meta name="google" content="notranslate">
<meta http-equiv="Content-Language" content="en">
If that doesn't work, you can always place a bunch of text (your "About" page for instance) in a hidden div. That might help with SEO as well.
EDIT (and more info)
The OP is asking about Chrome, so Google's recommendation is posted above. There are generally three ways to accomplish this for other browsers:
W3C recommendation: Use the lang and/or xml:lang attributes in the html tag:
<html lang="en" xml:lang="en" xmlns= "http://www.w3.org/1999/xhtml">
UPDATE: previously a Google recommendation now deprecated spec although it may still help with Chrome. : meta http-equiv (as described above):
<meta http-equiv="Content-Language" content="en">
Use HTTP headers (not recommended based on cross-browser recognition tests):
HTTP/1.1 200 OK
Date: Wed, 05 Nov 2003 10:46:04 GMT
Content-Type: text/html; charset=iso-8859-1
Content-Language: en
Exit Chrome completely and restart it to ensure the change is detected. Chrome doesn't always pick up the new meta tag on tab refresh.

I added lang="en" to the doctype declaration, added meta tags for charset utf-8 and Content-Langauge in the HTML header, specified charset as utf-8 and Content-Language as en in the HTTP response headers and it did nothing to stop Chrome from declaring my page was in Portuguese. The only thing that fixed the problem was adding this to the HTML header:
<meta name="google" content="notranslate">
But now I've prevented users from translating my page that is clearly in English to their own language. Poor job, Chrome. You can be better than this.

Specify the default language for the document, then use the translate attribute and Google's notranslate class per element/container, as in:
<html lang="en">
...
<span>English</span>
Explanation:
The accepted answer presents a blanket solution, but does not address how to specify the language per element, which can fix the bug and ensure your page remains translatable.
Why is this better? This will cooperate with Google's internationalization versus shut it off. Referring back to the OP:
Why does Chrome incorrectly determine page is in a different language and offer to translate?
Answer: Google is trying to help you with internationalization, but we need to understand why this is failing. Building off of NinjaCat's answer, we assume that Google reads and predicts the language of your website using an N-gram algorithm -- so, we can't say exactly why Google wants to translate your page; we can only assume that:
There are words on your page that belong to a different language.
Marking the containing element as translate="no" and lang="en" (or removing these words) will help Google to correctly predict the language of your page.
Unfortunately, most people reaching this post won't know what words are causing the trouble. Use Chrome's built-in "Translate to English" feature (in the Right-Click context menu) to see what gets translated, you may see unexpected translations like the following:
So, update your html with the appropriate translation tags until the Google Translation of your page changes nothing -- then we should expect the popup to go away for future visitors.
Won't it be a lot of work to add all these extra tags? Yes, very likely. If you are using Wordpress or another Content Management System then look in their documentation for quick ways to update your code!

Without knowing what the text was, perhaps the ngram detection is being tricked by the content of your page.
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
https://en.wikipedia.org/wiki/N-gram

Chromium thinks this page in Filipino: http://www.reyalvarado.com/portfolio/cuba/
Notes: There is pretty much no text on the page except for the owner's name and the menu items. Menu items are dynamically replaced with images by FLIR.
The HTML declares the page as US English:
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">

Try including the property xml:lang="" to the <html>, if the other solutions don't work:
<html class="no-js" lang="pt-BR" dir="ltr" xml:lang="pt-BR">

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008